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ABSTRACT 



Software products evolve over time. Sometimes they evolve by adding new features, 
and sometimes by either fixing bugs or replacing outdated implementations with new 
ones. When software engineers fail to anticipate such evolution during development, 
they will eventually be forced to re-architect or re-build from scratch. Therefore, it has 
been common practice to prepare for changes so that software products are extensible 
over their lifetimes. However, making software extensible is challenging because it is 
difficult to anticipate successive changes and to provide adequate abstraction mech- 
anisms over potential changes. Such extensibility mechanisms, furthermore, should 
not compromise any existing functionality during extension. Software engineers would 
benefit from a tool that provides a way to add extensions in a reliable way. It is nat- 
ural to expect programming languages to serve this role. Extensible programming is 
one effort to address these issues. 

In this thesis, we present type safe extensible programming using the MLPolyR 
language. MLPolyR is an ML-like functional language whose type system provides 
type-safe extensibility mechanisms at several levels. After presenting the language, 
we will show how these extensibility mechanisms can be put to good use in the 
context of product line engineering. Product line engineering is an emerging software 
engineering paradigm that aims to manage variations, which originate from successive 
changes in software. 
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CHAPTER 1 
INTRODUCTION 



Software products evolve over time. Sometimes they evolve by adding new features, 
and sometimes by fixing bugs that a previous release introduced. In other cases, 
they evolve by replacing outdated implementations with better ones. Unless software 
engineers anticipate such evolution during development, they will eventually be forced 
to re-implement them again from scratch. Therefore, it has become common practice 
to prepare for extensibility when we design a software system so that it can evolve 
over its lifetime. For example, look at the recent release history of the SML/NJ 
compiler: 

• 1/13/09. vllO.69. Add new concurrency instructions to MLRISC. Fix problem 
with CM tools. 

• 9/17/08. vllO.68. Improve type checking and type error messages. Re-implement 
the RegExp library. Fix bugs in ml-ulex. Update documentation. Add NLFFI 
support in Microsoft Windows. 

• 11/15/07. vllO.67. Fix performance bugs. Support Mac OS X 10.5 (Leopard) 
on both Intel and PPC Macs. Drop support for Windows 95 and 98. 

The SML/NJ compiler has evolved by means of adding and replacing functionality 
since its birth around the early 1990s. Interestingly, its evolutio n is sequential in tha t 



20051). 



all its changes have been integrated together into a new release (iBuckley et al. 
In this scenario, we are interested in easily adding extensions to an existing system, 

1 



2 

and therefore extensibility mechanisms become our major concern. Furthermore, we 
would like to have extensibility mechanisms which do not compromise any functions 
in the base system. Hence, software engineers need a tool that provides a way to 
add extensions in a reliable way, and it is natural to expect programming languages 
to function in this way. Functional languages such as SML and Haskell have already 



improv ed safety in the sense that "well-typed programs do not go wrong." (IMilner 



1978 bl ). Beyond this, we would like to have a language safe enough to guarantee that 
nothing bad happens during extensions. This approach will work well for sequential 
evolution since extensible languages make it easy to extend one version into another 
in a reliable way. 

There are many cases, however, where software changes can not be integrated into 
the original product, and as a result, different versions begin to coexist. Moreover, 
there are even situations where such divergence is planned from the beginning. A 
marketing plan may introduce a product lineup with multiple editions. Windows 
Vista, which ships in six editions, is such an example. These editions are roughly 
divided into two target markets, consumer and business, with editions va rying to 



20061 ). Then, 



meet the specific needs of a large spectrum of customers (iMicrosoft 
each edition may evolve independently over time. Unless we carefully manage each 
change in different editions, multiple versions that originate from one source start to 
coexist separately. They quickly become so incompatible that they require separate 
maintenance, even though much of their code is duplicated. This quickly leads to a 
maintenance nightmare. In such a case, the role of programming languages become 
limited and, instead, we need a way of managing variability in a product lineup. 
Svahnberg studies the relationship between variability and evolution, as shown 



in Figure 11.11 where product variations and product release span two dimensions. As 
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Figure 1.1: Variability and evolution ( Svahnberg 20001 ) . 



his figure suggests, a set of products evolve over time just as one product does. Any 
extensibility mechanism which does not take these two dimensions into consideration 
can not fully provide satisfactory solutions. 

In this thesis, we propose type safe extensible programming which takes two di- 
mensions into consideration. In particular, our language provides extensibility mech- 
anisms at multiple levels of granularity, from the fine degree (at the core expression 
level) to the coarse degree (at the module level). At the same time, in order to 
manage variability, we adopt product line engineering as a developing paradigm, and 
then provide a development process which guides how to apply this paradigm to our 
extensibility mechanisms: 

• A core language that supports polymorphic extensible records, first-class cases 
and type safe exception handling (Section [3]); 

• A module system that supports separate compilation in the presence of the 
above features (Section H]); 

• A development process that supports the construction of a family of systems 
(Section [5]). 



CHAPTER 2 
RELATED WORK 



Extensible programming is a programming style that focuses on mechanisms to extend 
a base system with additional functionality. The main idea of extensible programming 
is to use the existing artifacts (e.g., code, documents, or binary executables) but 
extend them to fit new requirements and extensibility mechanisms take an important 
role in simplifying such activities. Building extensible systems has received attention 
because it is seen as a way to reduce the development cost by reusing the existing 
code base, not by developing them from scratch. Furthermore, nowadays software 
products need to support extensibility from the beginning since the current computing 
environment demands a high level of adaptability by software products. Extensible 
programming provides language features designed for extensibility in oder to simplify 
the construction of extensible systems. In the remainder of this section, we will study 
similar works that take extensibility and adaptability in software into consideration. 



2.1 The extensible language approach 



Software evolves by means of adding and/or replacing its functionality over time. Such 
extensibility has been studied extensively in the context of compilers and program- 
ming languages. Previous work on extensible compilers has proposed new techniques 
on how to easily add extensions to existing programming languages and their compil- 



ers. For exa mple, JaCo is an extensib 



braic types (jZenger and Odersky 



2001 



e compiler for Java based on extensible alge- 



20051 ). The Polyglot framework implements 



an extensible compiler where even changes of compilation phases and manipulation 



2003). Aspect-oriented 



20051). 



of internal abstract syntax trees are possible (iNystrom et al.l 
concepts are also applied to extensible compiler construction (I Wu et al.l 

However, most of these existing solutions do not attempt to pay special atten- 
tion to the set of extensions they produce. Extensions are best accomplished if the 
original code base was designed for extensibility. Even worse, successive extensions 
can make the code base difficult to lea rn an d hard to change substantially. For ex- 



ample, the GNU Compiler Collection (jGCQ ) started as an efficient C compiler but 



has evolved to officially support more than seven programming languages and a large 
number of target architectures. However, a variety of source languages and target 
architect ures have resu lted in a complexity that makes it difficult to do GCC devel- 



opment (jVichare 



20081 ) . This effect a pparently even led to some rifts within the GCC 



developer community (IMatzan 



2003). 



2.2 The design patterns approach 

In software engineering, extensibility is one kind of design principle where the goal is 
to minimize the impact of future changes on existing system functions. Therefore, it 
has become common practice to prepare for future changes when we d esign systems. 



The concept of Design patterns takes an important role in this context (IGamma et al. 



19951 ). Each pattern provides design alternatives which take changes into considera- 
tion so that the system is robust enough to accommodate such changes. For example, 
the visitor pattern makes it easy to define a new operation without changing the 
classes of the members on which it is performed. It is particularly useful when the 
classes defining the object structure rarely change. By clearly defining intent, applica- 



bility and consequences of their application, patterns will help programmers manage 



6 



changes. 

However, design patterns are not generally applicable to non-object-oriented lan- 
guages. Even worse, Norvig shows how it is trivial to implement various design 



patterns in dynamic languages (INorvig 



19981 ) . S ome criticize tha t design patterns are 



just workarounds for missing language features (IMonteiro 



20061). 



2.3 The feature-oriented programming approach 



Product l ine engineering is an emerging paradigm fo r construction of a family of 



products (IKang et al. 



2002 



Lee et al 



2002 



SEl 



20081 ). This paradigm encourages 



developers to focus on developing a set of products rather than on developing one 
particular product. Therefore, mechanisms for managing variability through the de- 
sign and implementation phases are essential. While most efforts in product line 
engineering have focused on principles and guidelines, only a few have suggested 
concrete mechanisms of implementing variations. Consequently, their process-centric 
approach is too abstract to provide a working solution in a particular language. For 
example, the Feature-Oriented Reuse Method (FORM) often sugg ests parameteriza- 



tion techniques , but implementation details are left to developers (IKang et al. 



1998 



Lee et al 



2000l ). Therefore, preprocessors, e.g., macro systems, have been used in 



many examples in the literature as the feature delivery method (IKang et al. 



1998 



20051 ). For example, the macro language in FORM determines inclusion or exclusion 
of some code segments based on the feature selection. Macro languages have some 
advantage in that they can be mixed easily with any target programming languages, 
however, feature specific segments are scattered across multiple classes, so code can 
easily become complicated. Even worse, since general purpose compilers do not un- 



derstand the macro language, any error appearing in feature code segments cannot 



7 



be detected until all feature sets are selected and the corresponding code segments 
are compiled. 

In order to take advantage of the current compiler technology including static typ- 
ing and separate compilation, we need native language support. Therefore, feature- 
oriented program ming emerges as an attera pt to provide b etter support for fea- 



ture mod ularity 



AHEAD f Batorv 



Lopez-Herrejon et al. 



20051 ). FeatureC++ flApel et all boosh and 



20041 ) are such language extensions to C++ and Java, respectively. 
In these approaches, features are implemented as distinct units and then they are 
combined to become a product. However, there still is no formal type system, so 
these languages do not g uarantee the absence of type errors during feature composi- 



tion (IThaker et al. 



20071 ). Recently, such a formal 



type system has been proposed for 



a simple, experimental feature-oriented language (lApel et al. 



20081). 



2.4 The generic programming approach 

The idea of generic programming is to implement the common part once and pa- 
rameterize variations so that different products can be instantiated by assigning dis- 
tinct values as parameters. Higher-order modules, also known as functors - e.g., in 
the Standard ML programming language (SML), are a typical example in that they 
can be parameteri zed on values, types and ey en other modules, possibly including 



higher-order ones (lAppel and MacQueen 



199ll ). The SML module system has been 



demo nstrated to be powerful enough to manage variations in the context of product 



lines (IChae and Blume 



20081). 



However, its type system sometimes impose restrictions which require code dupli- 
cation between functions on data types. Many proposals to overcome this restriction 



have been presented. For example, MLPolyR prop oses extensible cases ( jBlume et al. 



20061 ). and OCaml proposes polymorphic variants (IGarrigue 



2000). 



Similarly, templates in C++ provide parameterization over types and have been 



extensively studied in the context of programming families (jCzarnecki and Eisenecker 



2OOOI ). Recently, an improvem ent that would provide better 



gramming has been proposed (IDos Reis and Stroustrup 



support of generic pro- 



20061 ). Originally, Jav a and 



C# did not support para meterized types but now both support similar concepts (ITorgersen 



2004 



Garcia et al. 



2003). 



Sometimes the generic programming approach i s criticized for its difficulty in iden - 



2OO1I). 



tifying variation points and defining parameters (iGacek and Anastasopoules 
However, systematic reasoning (e.g., product line analysis done by product line ar- 
chitects) can eas e this burden by provi ding essential information for product line 



implementation (jChae and Blume 



20081). 



2.5 The generative programming approach 

Generative programming is a style of programming that utilizes code generation tech- 
niques which make it possible to g enerate code from generic artifacts such as speci- 



fications, diagrams, and templates (jCzarnecki 



2OO4J ). This approach is similar to the 



generic programming approach in that a specialized program can be obtained from a 
generic one, but the generative programming approach focuses on the usage of domain 
specific languages and their code generators while the generic programming approach 
focuses on the usage of the built-in language features such as templates and functors. 



2.6 The open programming approach 



Extensions can be added generally by modifying source code. In this compile-time 
form of extensions, a program needs to be compiled for extensions to become available. 
However, in some cases, a software product need to modify its behavior dynamically 
during its execution. Non-stop applications are such examples. Sometimes, a certain 
type of change can be arranged to be picked up by a linker during load-time. Open 
programming is an attempt at addressing these issues in the context of programming 
languages. For instances, Java can dynamically load (class-) libraries for this sort of 
thing. Rossberg proposes the Alice ML pro gramming langu age which reconciles open 



20071 ). 



programming concepts with strong typing (iRossbergl 

Similarly, there have been attempts to upgrade software while it is running. Appel 
illustrated the usage of "applicative" module h nking to dern onstrate how to replace a 



19941 ). How ever, it was 



jlan g 



20071 ). 



software module without having the downtime (lAppell 

that made this "hot-sliding" or "hot code swapping" idea popular (1 Armstrong! l! 
In Erlang, old code can be phased out and replaced by new code, which makes it 
easier to fix bugs and upgrade a running system. 

Unlike these approaches, we focus on compile-time extensions by modifying source 
code with minimal efforts. 



CHAPTER 3 
TYPE SAFE EXTENSIBLE PROGRAMMING 



3.1 Introduction 

The MLPolyR language has been specifically designed to support type-safe extensi- 
ble programming at a relatively fine degree of granularity. Its records are polymorphic 
and extensible unlike in most programming languages where records must be explic- 
itly declared and are not extensible. As their duals, polymorphic sums with extensible 
cases make composable extensions possible. Moreover, by taking advantage of repre- 
senting exceptions as sums and assigning exception handlers polymorphic, extensible 
row types, we can provide type-safe exception handling, which suggests "well-typed 
programs do not have uncaught exceptions." 

To understand the underlying mechanism, it is instructive to first look at an 
example. The following sections informally provide such examples that highlight the 
extensible aspect of the MLPolyR language. Then, we show how these constructors 
provide a solution to the expression problem which is considered one of the most 
fundamental problems in the study of extensibility (Section 13.21) . 

Theoretical a spects of this language (d erived from the previously published con- 



ference papers ( iBlume et al. 



2006 



20081 )) are presented in the following sections. 



First, we consider an implicitly typed external language EL that extends A-calculus 
with polymorphic extensible records, extensible cases and exceptions. Our implemen- 
tation rests on a deterministic, type sensitive semantics for EL based on elaboration 



10 



11 



(i.e., translation) into an explicitly typed internal language IL. The elaboration pro- 
cess involves type inference for EL. Our compiler for MLPolyR provides efficient 
typ e reconstructio n of principal types by using a variant of the well-known algorithm 



W flMilner 



1978al ). Finally, IL is translated into a variant of an untyped language, 
called LRec, which is closer to machine code. Therefore, our compiler is structured 
as follows: 



CPS and Dual transformation 



EL 

Implcitly typed 
Section 13.31 



IL 

Explicitly typed 
Section 13.41 



Index Passing 



LRec 
Untyped A— calculus 

Section 13.51 



3.1.1 Polymorphic extensible records 

MLPolyR supports polymorphic extensible records. One of its record expressions 
has the form {a = e, ... = r }. This creates a new record which extends record 
r with a new field a. Table 13.11 shows more such record operations. Record update 
and renaming operations can be derived by combining extension and subtraction 
operations. 

To understand the extension mechanism, let us first look at an example. Since 
records are first-class values, we can abstract over the record being extended and 
obtain a function add_a that extends any argument record (as long as it does not 
already contain a) with a field a. Such a function can be thought of as the "difference" 
between its result and its argument: 

1 fun add_a r = {a=l, ...=r} 
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Selection 


val seLa : Vp : {a} . {a : t, p} —>■ {t} 
fun seLa r — r.a 


Extension 


val addjCL : Vp : |a| . |p| — > |a : int, p} 
fun addjn r = {a = 1, . . . = r} 


Subtraction 


val stt6_a : V/3 : {a} .Va.ja : a,/?} — > 
fun suhja {a — — r} — r 


Update 


val iipd_a : V/3 : {a} .Va.fa : a, /3} ^ {a : int, (3} 

fun updja r = let {a = _,■■• = rest} = r in {a = 1, . . . = rest} 


Rename 


val ren^a : V/9 : {a, 6} .Va.{a : a, /?} ^ {6 : a, P} 

fun renja r = let {a = a', • • • = Test} = r in (6 = a', . • • = rest} 



Table 3.1: Basic operations on records in MLPolyR. 



Here the difference consists of a field labeled a of type int and value 1. The type 
of function add_a is inferred as V/9 : {a} . {(3} {a : int, (3} where /? : {a} represents 
a constraint that a row variable /9 must not contain a label a. We can write simi- 
lar functions add_b and add_c which add fields b of type bool and c of type string 
respectively: 

1 fun add_b r = { b = true , . . . = r } 

2 fun add_c r = { c = "hello", ... = r } 

We can then "add up" record differences represented by add_a, add_b, add_c by 
composing these functions: 

1 fun add_ab r = add_a (add_b r) 

2 fun add_bc r = add_b (add_c r) 

where the inferred types are respectively: 



val add_ab : V/? : {a, b} . {/?} ^ {a : int, b : bool, p} 
val add_bc : V/3 : {b, c} . {(3} {b : bool, c : string, (3} 
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Finally, we can create actual records by "adding" differences to the empty record: 



1 val a = add_a {} 

2 val ab = add_ab {} 

3 val be = add_bc {} 



Records as classes 



Extensible records continue to receive attention since they can also be used as a type 



19981). For 



theoretical basis for object-oriented programming (iRemy and VouillonI 
example, assuming polymorphic records and references in place, we can define a base 
class, and then create sub-classes with additional methods in order to obtain the same 
effect of code reuse via inheritance. 



As a demonstration of records as classes (followed by Pierce's encoding (jPierce 



20021 )). we first define a counter class which provides two methods: 1) get returns 
the current value of a field i by dereferencing and 2) inc increments its value by first 
reading and then assigning its incremental) as follows: 



1 val counterClass = fn x => 

2 { get = fn _ => X ! i , 

3 ine = fn _ => x! i := x! i + 1 

4 } 

where ! is a dereferencing operator and := is an assignment operator. Then, individual 
counter objects can be obtained by a counter generator newCounter which applies 
counterClass to a record with a reference field i: 



1 val newCounter = fn _ => let 

2 val X = {| i = 1} 

3 in counterClass x 

4 end 
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where {| • • • |} denotes a mutable record. Furthermore, by taking advantage of exten- 
sible records, we can implement a subclass resetCounterClass which extends the base 
class counterClass with a new method reset hke this: 

1 val resetCounterClass = fn x => 

2 {...= counter Class x, 

3 reset=fn_ => x! i := 

4 } 

where . . . refers to the same fields that the base class contains, so the returned value 
contains one more field named reset. Similarly, individual resetCounter objects can be 
obtained by a generator newResetCounter: 

1 val newResetCounter = fn _ => let 

2 val X = {| i= 1} 

3 in resetCounterClass x 

4 end 



3.1.2 Extensible programming with first-class cases 
Variants are dual of records in the same manner as logical V is dual to A: 

-.{a A 6} = (-.a V -.6) 
-.(aV6) = {-.a A -.6} 

Then, as in any dual construction, the introduction form of the primal corresponds 
to the elimination form of the dual. Thus, elimination forms of sums (e.g., match) 
correspond to introduction forms of records. In particular, record extension (an in- 
troduction form) corresponds to the extension of cases (an elimination form). This 
duality motivates making cases first-class values as opposed to a mere syntactic form. 
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With cases being first-class and extensible, one can use the usual mechanisms of func- 
tional abstraction in a style of programming that facilitates composable extensions. 

Here is a function representing the difference between two code fragments, one of 
which can handle case 'A while the other, represented by the argument c, cannot: 

fun add_A c = cases 'A () => print "A" 
default : c 

where data type constructors ('A ()) are represented by prefixing their names with a 
backquote character '. Note that function add_A corresponds to add_a of the dual (in 
SectionEXH). The type inferred for add_A is V/5 : {'A} ^ ()) ^ {{'A : ^ 
0) where a type (p) ^ r denotes the type of first-class cases, (p) is the sum type that 
is being handled, and r is the result. We also assume that () denotes a unit type. 

Examples for functions add_B and add_C (corresponding to add_b and add_c in the 
dual) are: 

fun add_B c = cases 'B () => print "B" 

default : c 
fun add_C c = cases 'C () => print "C" 

default : c 

As in the dual, we can now compose difference functions to obtain larger differences: 

fun add_AB c = add_A (add_B c) 
fun add_BC c = add_B (add_C c) 

By applying a difference to the empty case nocases we obtain case values: 

val case_A = add_A nocases 
val case_AB = add_AB nocases 
val case_BC = add_BC nocases 
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These values can be used in a match form. The match construct is the ehmination 
form for the case arrow The following expression will cause "B" to be printed: 



match ' B 



with case_BC 



The previous examples demonstrate how functional record extension in the primal 
corresponds to code extension in the dual. The latter feature gives rise to a simple 
programming pattern facilitating composable extensions. Composable extensions can 
be used as a princi pled approach to solving the well-known expression problem de- 



scribed by Wadler (IWadler 



19981 ). We will show how our composable extensions 



provide a solution to the expression problem in the following section (Section 13. 2p . 



3.1.3 Exception handlers as extensible cases 

Exceptions are an indispensable part of modern programming languages. They are, 
however, handled poorly, especially by higher-order languages such as ML and Haskell: 
in both languages a well-typed program can unexpectedly fail due to an uncaught ex- 
ception. MLPolyR enriches the type system with type-safe exception handling by 
relying on representing exceptions as sums and assigning exception handlers polymor- 
phic, extensible row types. Our syntax distinguishes between the act of establishing a 
new exception handler (handle) and that of overriding an existing one (rehandle). 
The latter can be viewed as a combination of unhandle (which removes an existing 
handler) and handle. This design choice makes it possible to represent exception 
types as row types without need for additional complexity. From a usability perspec- 
tive, the design makes overriding a handler explicit, reducing the likelihood of this 
happening by mistake. 

We will now visit a short sequence of simple program fragments, roughly ordered 
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by increasing complexity. None of the examples exhibits uncaught exceptions. The 
rejection of any one of them by a compiler would constitute a false positive. The type 
system and the compiler that we describe accept them all. 

Of course, baseline functionality consists of being able to match a manifest occur- 
rence of a raised exception with a manifestly matching handler: 

1 (... raise 'Neg 10 ...) handle 'Neg i => ... 

The next example moves the site where the exception is raised into a separate function. 
To handle this in the type system, the function type constructor acquires an 
additional argument p representing the set of exceptions that may be raised by an 
application, i.e., function types have the form ri T2- This is about as far as 
existing static exception trackers that are built into programming languages (e.g., 
Java's throws declaration) go. 

1 fun foo X = if X < then raise 'Neg x else ... 

2 (... foo y ...) handle 'Neg i => x ... 

But we also want to be able to track exceptions through calls of higher-order functions 
such as map, which themselves do not raise exceptions while their functional arguments 
might: 

1 fun map f [ ] = [ ] 

2 I map f (x::xs) = fx :: map f xs 

3 ( . . . map f 1 . . . ) handle ' Neg i => ... 

Moreover, in the case of curried functions and partial applications, we want to be 
able to distinguish stages that do not raise exceptions from those that might. In 
the example of map, there is no possibility of any exception being raised when map is 
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partially applied to the function argument; all exceptions are confined to the second 
stage when the list argument is supplied: 

val mfoo = map foo 

( . . . mfoo 1 . . . ) handle ' Neg i => . . . 

Here, the result mfoo of the partial application acts as a data structure that carries a 
latent exception. In the general case , exception values can occu r in any data structure. 



For example, the SML/NJ Library (iGansner and Reppy 



2OO2I ) provides a constructor 



function for hash tables which accepts a programmer-specified exception value which 
becomes part of the table's representation from where it can be raised, for example 
when an attempt is made at looking up a non-existing key. 

The following example shows a similar but simpler situation. Function check finds 
the first pair in the given list whose left component does not satisfy the predicate ok. 
If such a pair exists, its right component, which must be an exception value, is raised. 
To guarantee exception safety, the caller of check must be prepared to handle any 
exception that might be passed along in the argument of the call: 

fun check ((x, e)::rest) = if ok x then check rest else raise e 
[ check [] = 

(... check [(3, 'A 10), (4, 'B true)] ...) handle 'A i => ... 

I 'B b => . . . 

Finally, exception values can participate in complex data flow patterns. The following 
example illustrates this by showing an exception 'A that carries another exception 
'B as its payload. The payload 'B 10 itself gets raised by the exception handler for 
'A in function f 2, so a handler for 'B on the call of f 2 suffices to make this fragment 
exception-safe: 

fun fl = ... raise 'A ( 'B 10) ... 
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2 fun f2 = fl handle 'A x => raise x 

3 ( . . . f2 . . . ) handle 'B i => ... 



3.2 Case study: A two-way extensible interpreter 

There are two axes along which we can extend a system: functionahty and variety of 
data. For the first axis, we can add more functionahty on the basic set of data. For 
the second axis, we can add to the variety of data on which the basic functions per- 
form. IdeaUy, two dimensional extensions should be orthogonal. However, depending 
on the context, extensions along one axis can be more difficult than along the other. 
Simultaneous two-way extensions can be even more difficult. This phenomenon can 
be easily explained in terms of expressions (data) a nd evaluators (functions), which 



19981 ). This section dis- 



the reason Wadler called it the expression problem ( IWadlerl 
cusses a two-way extensible interpreter that precisely captures this phenomenon. Our 
intention with this case study is to define a real yet simple example that extends its 
functionality in an interesting way. 



Base language 

Let us consider a Simple Arithmetic Language (SAL) that contains terms such as 
numbers, variables, additions, and a let-binding form. Not all expressions that con- 
form to the grammar are actually "good" expressions. We want to reject expressions 
that have "dangling" references to variables which are not in scope. The judgment 
r h e ok expresses that e is an acceptable expression if it appears in a context 
described by F. In this simple case, F keeps track of which variables are currently in 
scope, so we take it to be a set of variables. An expression is acceptable as a program 
if it is an expression that makes no demands on its context, i.e., h e ok . When 
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Values n G N 
Variables x E Var 

Terms e ::= n \ x \ e + e \ let x = e in e 
r h e ok 

Typing env. V ::= | r,x 

X G r r h ei ok r h 62 ok r h ei ok r, X h 62 ok 
r h n ok r h X ok r h ei + 62 ok F h let x = 6i in 62 ok 

Environment E E Var — > 

E{x) = n {E,ei)\\.ni {E,e2)^n2 ni + n2 = n 

iE,x)ij-n (^,61 + 62) 4^ {E,n)ij-n 

(E,6i)-I|ni (E[x ^ ni],62) 4 ^2 
{E, let X = 61 in 62) 4 n2 

Figure 3.1: The Simple Arithmetic Languages (SAL): syntax (top), the static seman- 
tics (2nd) and the evaluation semantics (bottom). 

discussing the dynamic semantics of a language, we need to define its values, i.e., the 
results of a computation. In SAL, values are simply natural numbers. Then, our 
evaluation semantics describes the entire evaluation process as one "big step". We 
write {E, e) i}^ n to say that e evaluates to n under environment E. The environment 
is a finite mapping from variables to values. 

Figure 13.21 shows a simple implementation for the base interpreter which is the 
composition of the function check (realizing the static semantics) and eval (realizing 
the evaluation semantics). As explained in Section l3.L2l our language MLPolyR has 
polymorphic sum types. The type system is based on Remy-style row polymorphism, 
handles equi-recursive types, and can infer principal types for all language constructs. 
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For function eval in Figure [3^ the compiler calculates the following type, 
val eval : 

V/?:0. ((a as <'Let of (string, a, a), 
' Num of int , 
'Plus of (a , a) , 

Var or string>; , string — > int; int 

Here a is a recursive sum type, indicated by keyword as and a type row closed in 
< ... >. /? is a row type variable constrained to a particular kind representing a set 
of labels that must be absent in any instantiation. 

Preparation for extensions 

Because it is desirable to extend the base language by new language features, we 
had better prepare for language extensions. In MLPolyR, first-class extensible cases 
can be helpful to make code extensible. Case expressions have an elimination form, 
match ei with 62 where ei is a scrutinee and 62 is a case expression. First, we 
separate cases from the scrutinee in the match expression. Then, we parameterize 
them by closing over their free variables. One of these free variables is the recursive 
instance of the current function itself. This design achieves open-recursion. With 
this setting, it becomes easy to add a new variant (i.e., new cases). For example. 
Figure [373] shows the old function check becomes a pair of check_case and check. The 
new version of eval follows the same pattern. For eva Lease, the compiler calculates 
the following type and here it shows that its return type is the case type denoted by 
(P) - r: 
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1 (* environment *) 

2 fun bind (a, x, env) y = 

3 if String . compare (x, y) = then a else env y 

4 

5 fun empty x = 

6 raise 'Fail (String. concat ['' unbound variable: '', x, ''\n'']) 
7 

8 (* the static semantics *) 

9 (* check returns () or fails with 'Fail *) 

10 fun check (e, env) = match e with 

11 cases 'Var x => env x 

12 I 'Num n => 

13 I 'Plus (el, e2) => (check (el, env); check (e2 , env)) 

14 I 'Let (x, el, e2) => (check (el, env); 

15 check (e2, bind ((), x, env)) 
16 

17 (* the evaluation semantics *) 

18 fun eval (e, env) = match e with 

19 cases 'Var x => env x 

20 I 'Num n => n 

21 I 'Plus (el, e2) => eval (el, env) + eval (e2 , env) 

22 I 'Let (x, el , e2) => 

23 eval (e2 , bind (eval (el, env), x, env)) 
24 

25 (* the interpreter obtained by composing two functions 

26 fun inter p e = 

27 try r = (check (e, empty); eval (e, empty)) 

28 in r 

29 handling 'Fail msg => ( String . output msg; —1) 

30 end 



Figure 3.2: A simple implementation for the base interpreter. 
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1 (* extensible cases for the static semantics *) 

2 fun check_case (check, env) = 

3 cases 'Var x => env x 

4 I 'Num n => 

5 I 'Plus (el, e2) => (check (el, env); check (e2 , env)) 

6 I 'Let (x, el, e2) => (check (el, env); 

7 check (e2, bind ((), x, env)) 

8 

9 (* close open recursion for the static semantics *) 

10 fun check (e, env) = match e with check_case (check, env) 
11 

12 (* extensible cases for the evaluation semantics *) 

13 fun eval_case (eval, env) = 

14 cases 'Var x => env x 

15 I 'Num n => n 

16 I 'Plus (cl , e2) => eval (el, env) + eval (e2 , env) 

17 I 'Let (x, el , e2) => 

18 eval (e2, bind (eval (el, env), x, env)) 
19 

20 (* close open recursion for the evaluation semantics *) 

21 fun eval (e, env) = match e with eval_case (eval, env) 

Figure 3.3: Preparation for extensions. 



val eval_case: 

V/5 : 0. ((a, string int) int, string ^ int) 
(<'Let of (string, a, a), 
'Num of int, 
'Plus of (a, a) , 
'Var of string>) int) 
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Terms e | ifO (e, e, e) 

The ok 

r h ei ok r h 62 ok r h 63 ok 
r h ifO (61,62,63) ok 

{E, e) ^ n 

(E,ei)^0 (^,e2)^n2 (E,6i)^ni ni ^ (^,63)^713 
(E, ifO (61, 62, 63)) ^ n2 (£;, ifO (61, 62, 63)) ^ n3 

(A;, E, e) =^ (k', E', e') (k, E, e) (n, k') (n, k) =^ (n', k') (n, k) =^ (k', E, e) 

Frame / ::= {[]+e,E)\{n+ []) | (let x = [] in e, E) \ (ifO ([], e, e)) 
Stack k ::= ■ \ f > k 

{k, E, x) =^ {E{x), k) {k, E, n) =^ (n. A;) 

(k, E, 61 + 62) ^ (([] + 62, E) > fc, E, 61) 

(A;, E, let a; = ei in 62) =^ ((let x = [] in 62, -E) > /c, ei) 

(A;, E, ifO (ei, 62, 63)) ^ ((ifO ([], 62, 63), E) > fc, ^, 61) 

(n, ([] + 6, £;) > A;) ^ ((n + []) > A;, e) (n, (n' + []» > A;) ^ (n' + n, k) 

(n, (let a; = [] in 62, E) > k) ^ {k, E[x — > n], 62) 

(0, (ifO([],62,e3),£;)>A;)^(A;,£;,e2) 

(n, (ifO ([], 62, 63), £■) > A;) =^ (A;, 63) where n^O 

6 > 6^ 

ni + n2 > n; n = ni + n2 ifO (0, 62, 63) > 62 ifO (n, 62, 63) > 63; n 7^ 

Figure 3.4: Language extensions: syntax (top), the static semantics (2nd), the evalu- 
ation semantics (3rd), the machine semantics (4th) and optimization rules (bottom). 
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Language extensions 

Figure [33] shows how the base language grows. As a conditional term IfO is introduced, 
the corresponding rule sets for both the static semantics (check) and the evaluation 
semantics (eval) are changed. Instead of the evaluation semantics, alternatively, we 
can define the machine semantics (evalm) which makes control explicit by representing 
computation stages as stacks of frames. Each frame / corresponds to a piece of work 
that has been postponed until a sub-computation is complete. Our machine seman tics 



follows the conventional single-step transition rules between states ( lHarpeill2005l ). It 
consists of expression states {k,E,e), value states {n,k) and a transition relation 
between states where is a stack and e is the current expression. The empty stack is 
■ and a frame / on top of stack k is written fok. The machine semantics is given as a 
set of single-step transition rules {k, E, e) =^ {k', E', e') and (n, k) =^ (n', k') between 
states. Additionally, optimization rules may be introduced. We write e e' to say 
that e is translated into e' by performing some simple optimization. In our running 
example, we consider constant folding and short-circuiting techniques. 



Implementation of extensions 

With our preparation for extensions in place, we only have to focus on a single new 
case ('IfO) by letting the original set of other cases be handled by check_case. Figure [331 
shows how an extended checker echeck, now handling five cases including 'IfO, is 
obtained by closing the recursion through applying echeck_case to echeck (Line 8). 
The extension of eval, called eeval, is constructed analogously by applying eeva Lease 
whose types is computed as follows: 
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val eeval_case: 

\/j3:0. {{a, string ^ int) ^ int, string ^ int) 
(<'IfO of (a, a, a) , 
'Let of (string, a, a), 
'Num of int, 
'Plus of (a, a) , 
'Var of string>) int) 



Finally, the extended interpreter can be obtained by applying eeval and echeck, 
instead of eval and check (Line 22). 

Adding new kinds of functions such as a new optimizer (opt) does not require any 
preparation in MLPolyR. For example, the combinator opt which performs constant 
folding may be inserted to build an optimized one: 



1 (* the helper function for handling three cases *) 

2 fun nope f = cases 'VAR x => f ( 'VAR x) 

3 I 'PLUS (el, e2) => f ('PLUS (el, e2)) 

4 I 'LET (x, el, e2) => f ('LET (x, el, e2)) 
5 

6 (* 'PLUS ('NUMnl, 'NUM n2) » 'NUM(nl+n2) *) 

7 (* otherwise , return arguments as received. *) 

8 fun chkPLUS (el, e2) = match el with 

9 cases 'NOVI nl => (match e2 with 

10 cases 'NUVI n2 => 'NUVI (nl+n2) 

11 default: nope (fn _ => 'PLUS (el, e2))) 

12 default: nope (fn _ => 'PLUS (el, e2)) 
13 

14 (* the optimization rules *) 

15 fun opt e = match e with 

16 cases 'Var x => 'Var x 

17 I 'Num n => 'Nimi n 

18 I 'Plus (el, e2) => chkPlus (opt el, opt e2) 

19 I 'Let (x, el, e2) => 'Let (x, el, e2) 
20 
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1 (* extends check-case with a new case ('IfO) *) 

2 fun echeck_case (check, env) = 

3 cases ' IfO (el , e2 , e3) => 

4 (check (el, env); check (e2, env); check (eS, env)) 

5 default: check_case (check, env) 
6 

7 (* close open recursion with the extension *) 

8 fun echeck (e, env) = match e with echeck_case (echeck , env) 
9 

10 (* extends eval-case with a new case ('IfO) *) 

11 fun eeval_case (eval, env) = 

12 cases ' IfO (el , e2 , e3) => 

13 if eval (el , env) = 

14 then eval (e2, env) else eval (e3, env) 

15 default: eval_case (eval, env) 
16 

17 (* close open recursion with the extension *) 

18 fun eeval (e, env) = match e with eeval_case (eeval , env) 
19 

20 (* the extended interpreter by composing extended functions *) 

21 fun einterp e = 

22 try r = (echeck (e, empty); eeval (e, empty)) 

23 in r 

24 handling 'Fail msg => ( String . output msg; —1) 

25 end 



Figure 3.5: Implementation for extensions. 
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21 (* the optimized interpreter by composing three functions *) 

22 fun optimizedlnterp e = 

23 try r = (check (e, empty); eval (opt e, empty)) 

24 in r 

25 handling 'Fail msg => ( String . output msg; —1) 

26 end 



where we define a function chkPlus wfiicli returns 'Num(ni + n2) if two arguments are 
recursively optimized to 'Num(ni) and 'Num(n2), respectively. Otherwise, it returns 
'Plus(opt ei, opt 62). Even though adding functions does not impose any trouble, opt 
itself should also be prepared for extension because opt itself may be extended to 
support a conditional term: 



1 (* extensible cases for the optimization rules *) 

2 fun opt_case opt = 

3 cases 'Var x => 'Var x 

4 I 'Num n => 'Num n 

5 I 'Plus (el, e2) => chkPlus (opt el, opt e2) 

6 I 'Let (x, el, e2) => 'Let (x, el, e2) 
7 

8 (* close open recursion for the optimization rules *) 

9 fun opt e = match e with opt_case opt 



Related work 

By using the well-known expression problem, we have demonstrated the MLPolyR 
language features make it possible to easily extend existing code with new cases. Such 
extensions do not require any changes to code in a style of composable extensions. 
These language mechanisms play an important role in providing a solution to the 
expression problem. Since Wadler described the difficult of the two-way extensions, 
there have been many attempts at solving the expression problem. 



Most of them have been studied in an object-oriented context (lOdersky and Wadler 
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1997 



Bourdoncle and Merz 



1997 



Findler and Flatt 



1999 



Flatt 



1999 



Bruce 



20031). 



Some tried to adopt functional style u sing the Visitor desi gn pattern to achieve easy 



extensions to adding new operations ( IGamma et al. 



19951 ). However, this approach 



made it difficult to add new data. To obtain extensibility in both dimensions, vari- 
ants were proposed such as the Extensible Visitor pattern and extensible alge braic 



datatypes with defaults (IKrishnamurthi et al. 



1998 



Zenger and Odersky 



20011 ) but 



they did not guarantee static type safety. Torgersen provided his sol ution using 



gener 



20M). 



ics and a simple trick (in order to overcome typing problems) in Java (ITorgerseru l! 
His insight was to use genericity to allow member functions to extend without modify- 
ing the type of parent's class but his approach required rather complex programming 
protocols to be observed. 



As the function al approach, Garrigu e presented his solution based on po 



variants in OCaml (IGarrigue 



1998 



ymorphic 



2000l ). As Zenger and Odersky point out (jZenger and Odersky 
200ll ). variant dispatching requires explicit forwarding of function calls. This is a con- 
sequence of the fact that in Garrigue's system, extensions need to know what they are 
extending. As a result, his solution is similar to our two-way extensible interpreter 
example but somewhat less general. 

Because extensions along one direction can be more difficult than along the other 
depending on implementation me chanisrns, the expression problem is often said to re- 



veal "tension in language design" (IWadler 



19981 ). Naturally, there have been attempts 



to live in the "best of both worlds" in order to design languages powerful enough 
to provide better solutions. For example, the Scala language integrates features of 
object-oriented and functional languages a nd provides type-safe solu tions by using 



its abstract types and mixin composition (jZenger and Odersky 



200,^1 ). OCaml also 



presents the similar solutions due to the beneffis of its integration of object-oriented 



features to ML jRemy and Vouillon 



1998 



Remy and Garrigue 



30 



20041 ) ■ As a smooth 



way of integration, OML and Extensible ML (EML) generalize ML constructs to 



support 



extensibility instead of 



OCaml (IReppy and Riecke 



directly providing cla ss and method definitions as in 



1996 



Millstein et al 



2002). Especially, EML supports ex- 



tensible functions as well as extensible datatypes. However, a function's extensibility 
in EML is second-class and EML requires explicitly type annotations due to difficulty 
of polymorphic type inference in the presence of subtyping while extensible cases in 



MLPolyR are first-class values and ful 



ant of the classic algorithm W f lMilner 



y gener al type inference is provided by a vari- 



1978a] ) only extended to handle Remy-style 



row polymorphism and equi-recursive types. 



3.3 The External Language (EL) 

In this section, we explore theoretical aspects of the MLPolyR language that we 
have seen informally. First, we start by describing EL, our implicitly typed external 
language that provides sums, cases, and mechanisms for raising as well as handling 
exceptions. 

3.3.1 Syntax 

Figure 13.61 shows the definitions of expressions e and values v. We have integer con- 
stants n, variables x, injection into sum types / e, applications ei 62, recursive func- 
tions fun f X = e, /ei-bindings let x = ei in 62. For record expressions, we have record 
constructors {li = ei, . . . ,ln = ^n} (which we will often abbreviate as { /j = Cj j^Li)) 
record extensions ei {/ = 62}, record subtractions e / and record selections e.L 
For case expressions, we have case constructors { /i xi ^ ei, . . . , In Xn ^ Cn } (ab- 
breviated as { /j Xj ^ Ci case extensions ei © {I x ^ 62}, case subtractions 
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Terms 



Values V 
Kinds K 
Label sets L 



Types 



Schemas 

Typenv 

Kindenv 



r 
P 

e 

a 
V 
A 



62 } 



n I X I / e I ei 62 I fun f x = e \ let x = ei in 62 

{k = '^i }r=l I ^1 ® = <22} I e l\e.l 
{li Xi ^ ej I match ei with 62 | e I \ ei { / a; 
raise e | ei handle {I x ^ 62} 

ei rehandle {I x ^ 62} \ ei handle {x ^ 62} \ e unhandle / 

n\iun f x = e\l v\ {k = Vi | { Xi 
* I L 

a I int I ri r2 I (pi) ^ t \ {p} \ {p) \ a as (p) 
a I . I / : r, p 
r I p 

r I Va : k.ct 

\ r,x ^ a 
\ A,a ^ K, 



Figure 3.6: External language (EL) syntax. 



e / and match expressions match with 62 which matches ei to the expression 
62 whose value must be a case. There are also raise e for raising exceptions and 
several forms for managing exception handlers: The form ei handle {I x ^ 62} 
establishes a handler for the exception constructor /. The new exception context is 
used for evaluating ei, while the old context is used for 62 in case ei raises /. The 
old context cannot already have a handler for /. The form ei rehandle {I x ^ 62}, 
on the other hand, overrides an existing handler for /. Again, the original exception 
context is restored before executing 62- The form ei handle {x ^ 62} establishes 
a new context with handlers for all exceptions that ei might raise. As before, 62 is 
evaluated in the original context. The form e unhandle / evaluates e in a context 
from which the handler for / has been removed. The original context must have a 
handler for /. 

The type language for EL is also given in Figure 13.61 It contains type variables 
...), base types (e.g., int), constructors for function- and case types (— >• and 
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A(a)=* Ahr:v. Ahr^Vc Ahp:0 Ahp: 

A~h~aT7 A h int : 7k- A h r r' : A h . : L A h (p) : 

LCA(a) A^/:0 Ahr:Vc Ahp:0 Ahr:^ Ahp:LU{/} /^L 
A h a : L A h (p') r : ^ A h (/ : r, p) : L 

Figure 3.7: Well-formedness for types in EL. 

record types ({p}), sum types ((p)), recursive sum types (a as (p)), the empty 
row type (.), and row types with at least one typed label (/ : r, p). Notice that 
function- and case arrows take three type arguments: the domain, the co-domain, 
and a row type describing the exceptions that could be raised during an invocation. 
A type 6 is either an ordinary type r or a row type p. Kinding judgments of the form 
A h r : K (stating that in the current kinding context A type r has kind n) are used 
to distinguish between these cases and to establish that types are well-formed. As a 
convention, wherever possible we will use meta-variables such as p for row types and r 
for ordinary types. Where this distinction is not needed, for example for polymorphic 
instantiation (VAR in Figure [3.10p . we will use the letter 6. 

Ordinary types have kind A row type p has kind L where L is a set of labels 
which are known not to occur in p. An unconstrained row variable has kind 0. 
Inference rules are given in Figure 13.71 The use of a kinding judgment in a typing 
rule constrains A and ultimately propagates kinding information back to the let/val 
rule in Figure 13.101 where type variables are bound and kinding information is used 
to form type schemas denoted by a. 
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e 
E 



E, 



... I restore 

^\l E\E e\v E\\etx = E III e 
{. . . , = Vi_i, li = E, li^i = e^+i, 
E O {l = e}\v O {1 = E}\E 
E © {I X ^ e} \ E e / 
match with e | match f with E 
raise I restore p ii^ 

I ^exn 

(fun f X = e) V 
let a; = f in e 

vi (g) {/ = ■U2} 
/ 

■u © { / X ^ e } 
V / 

match til with V2 
raise / v 

ei handle {I x ^ 62} 
ei rehandle {I x ^ 62} 
e unhandle / 
ei handle {x ^ 62} 
restore E^y^n 

{h = El, . . . Jn = En} 



.} I E.l 



Figure 3.8: Evaluation contexts E, redexes r and exception contexts E, 
3.3.2 Operational semantics 



exn- 



or EL as a context-sensi tive rewrite sys- 



I992I ). An evaluation 



We give an operational small-step semantics 
tem in a style inspired by Felleisen and Hieb (iFelleisen and Hiebl l 
context E is essentially a term with one sub-term replaced by a hole (see Figure [S75]) . 
Any closed expression e that is not a value has a unique decomposition E[r] into an 
evaluation context E and a redex r that is placed into the hole within E. Evaluation 
contexts in this style of semantics represent continuations. The rule for handling 
an exception could be written simply as E[{E'[ra.ise I v]) handle {Ix ^ e}] ^ 
E[e[v/x]], but this requires an awkward side-condition stating that E' must not also 
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contain a handler for /. We avoid this difficulty by maintaining the exception con- 
text separately and explicitly on a per-constructor basis. This choice makes it clear 
that exception contexts can be seen as extensible records of continuations. How- 
ever, we now also need to be explicit about where a computation re-enters the 
scope of a previous context. This is the purpose of restore-frames of the form 
restore Eexn ^ ^^^^ added to the language, but which are assumed not to oc- 
cur in source expressions. There are real-world implementations of languages with 
exception handlers where restore-frame s have a concrete manifestation. For example. 



SML/NJ flAppel and MacQueen 



19911 ) represents the exception handler 
variable storing a continuation. When leaving the scope of a handler, this variable 
gets assigned the previous exception continuation. 

An exception context -Eexn is a record {li = Ei, . . . ,ln = E^} of evaluation con- 
texts El, ... , En labeled li, . . . ,ln. A reducible configuration {E[r], -Eexn) pairs a 
redex r in context E with a corresponding exception context -Eexn that represents all 
exception handlers that are available when reducing r. A final configuration is a pair 
(f , {}) where f is a value. Given a reducible configuration {E[r], -Eexn), we call the 
pair {E, -Eexn) the full context of r. 

The semantics is given as a set of single-step transition rules from reducible con- 
figurations to configurations: {E[r], Eex.n) ^ (-^N)-^exn)- That is, a pair of an 
evaluation context with a redex E[r] and an exception context -Eexn evaluates to a 
pair of an evaluation context with an evaluated expression E[e] and a new excep- 
tion context -E^xn ^ single step. A program (i.e., a closed expression) e eval- 
uates to a value v if (e, {}) can be reduced in the transitive closure of our step 
relation to a final configuration {v, {}). Rules unrelated to exceptions are standard 
and leave the exception context unchanged. The rule for raise / v selects field I of 
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the exception context and places v into its hole. The result, paired with the empty 
exception context, is the new configuration which, by construction, will have the 
form (E' [restore ^'^^^ so that the next step will restore exception context 

-^exn- The rules for ei handle {I x ^ 62} and ei rehandle { / a; =^ 62 } as well as 
e unhandle I are very similar to each other: one adds a new field to the exception 
context, another replaces an existing field, and the third drops a field. All exception- 
handling constructs augment the current evaluation context with a restore-form so 
that the original context is re-established if and when ei reduces to a value. 



{E[{funfx^e) v],Ee^n) 
{E[let x^v in e\,Eexn) 
{E[{...,li^Vi,...}.li],Eexn) 
{E[{li^Vi}2=i {l^v}],Eexn) 
{E[{li^Vi}'^^^ lj],Eexn) 
(E[{ k xi^e'-}^^^ e {Ix^e}], E^^^) 
(Eiikxi^e'^}^^^ e l],Eexn) 
(£J[match 1^ v with { .... .Xj ^ ej, ... }], Eexn) 
(^[raise li v], {. . . ,li ^ E,, . . .}) 
{E[ei handle {I x^e2}], -Eexn) 
where 
and 



E. 



exn 

eL-, 



{E[ei rehandle { / 



{k = 
exn ~ {^1 ~ 
X }],£'exn) 

where il^exn = {h = 
and E'j = E[let x 

{E[e unhandle lj],Eexn) 
where ^^exn ^{k^ 
{E[ei handle { x ^ 62 }], Eexn) 
where E'^^^ = {k = 
(£;[restore ^/ v],Eexn) 



I — > (£;[e[fun / X = e//, v/x]], Eexn) 

^ {E[e[v/x]],Eexn) 

^ {E[vilEexn) 

I > {E[{li ^vi,...,ln^Vn,l^ v}],Eexn) 

^ {E[{k^Vi}2=i^i^j],Eexn) 

I — > {E[{ li XI =^ e[, . . . ,ln Xn ^ en,l X ^ e }], -Bexn) 

^ {E[{kx,^e'^}f^^-^j],Eexn) 

^ {E[e,[v/xi\],Eexn) 

I — > (^[restore e^^^ ei],^4xn) 

}i=l 

Ei,...,ln = En,l = E[\et X = restore Eey:n in ^2]} 
I — > (^[restore e^^^ ei],E'^^J 

E, and = { = E'^ j^^^^ and ^ j.E'^ = E, 

= restore E^^n ^2] 

I — ^ (£;[restore i;^^^ e],E'^^n) 

Ei }f=i and £;4xn ^ {k^ Ei }f=i,j^j 
I — > (£;[restore e^^^i eil' ^4xn) 

: £'[let X = (restore ^^^^^ []) in 62] j^^^ (for some n) 
^ (^M,^4xn) 



(app) 

(let) 

(r/sel) 

(r/ext) 

(r/sub) 

(c/ext) 

(c/sub) 

(match) 

(raise) 

(handle) 



(rehandle) 

(unhandle) 
(handle all) 
(restore) 



Figure 3.9: Operational semantics for EL. 



CO 



37 

3.3.3 Static semantics 

The type r of a closed expression e characterizes the values that e can evaluate to. 
From a dual point of view it describes the values that the evaluation context E must be 
able to receive. In our operational semantics E is extended to a full context {E, -Eexn)? 
so the goal is to develop a type system with judgments that describe the full context 
of a given expression. Our typing judgments have an additional component p that 
describes -Eexn by individually characterizing its constituent labels and evaluation 
contexts. 

General typing judgments have the form A; F h e : r; p, expressing that e has type 
T and exception type p. The typing environment F is a finite map assigning types 
to the free variables of e. Similarly, the kinding environment A maps the free type 
variables of r, p, and F to their kinds. 

The typing rules for EL are given in Figure 13.101 and Figure 13.111 Typing is 
syntax-directed; for most syntactic constructs there is precisely one rule, the only ex- 
ceptions being the rules for fun and let which rely on the notion of syntactic values 
to distinguish between two sub-cases. As usual, in rules that introduce polymorphism 
we impose the value restriction by requiring certain expressions to be valuable. Valu- 
able expressions do not have effects and, in particular, do not raise exceptions. We 
use a separate typing judgment of the form A; F hv e : r for syntactic values (VAR, 
INT, c, fun/val, and fun/non-val). Judgments for syntactic values are lifted to 
the level of judgments for general expressions by the value rule. The value rule 
leaves the exception type p unconstrained. Administrative rules teq and teq/v deal 
with type equivalences r ^ r', which expresses the relationship between two (row-) 
types where they are considered equal up to permutation of their fields. Rules for 
T ^ r' are described in Figure 13.121 



38 



r(a;) = Vai : . . .Van : Kn-T Vigi..„.A h 6*^ : 

: — ; ; — ^ (var) ( int) 

A;r K X :r[^l/ai,...,^n/an] A; T K n : int ' 

Viei..n-A; r, 1-^ Ti h ej : r; p A h (Zi : ri, . . . , Zn : Tn, .) : ^ 



■(c) 



A; r, / 1-^ (Va : 0.T2 -^T),a;i— >r2l-ve:T AI-T2:* Al-p:0 

(fun /val) 

A; r hv fun f X — e: T 

A;r, f T2 T, X T2 \- e : t; p A h r2 : ★ A h p : 
(fun/non-val) 

A; r hv fun f X — e: T 



A : r hv f : - - / A:rhf :-:/; 

-, (teq/v) — — — ^(teq) 



T ^ T f>^ p 



A;rhve:T ' ' ' A;rhe:T;p 
A;rhve:r Ahp:0 



A;r h e : r;p 



■ (value) 



A;r h ei : T2 t;p A;rhe2:r2;p 

(app) 

A;r h ei 62 : t;p 

ai,...,an = FTV(ri) \ FTV(r) A, ai ^i, . . . , ^ T hv ei : ri 

A; r, X Vai : Ki. . . . Van : Kn-n ^ 62 : r2; p 

(let / val) 



A; r h let X — ei in 62 ■ T2; p 

A;rhei:Ti;p A; T, x 1-^ ri h 62 : T2; p 
A; r h let X = ei in 62 : T2; p 



(let/non-val) 



A;r he :r;p' Ah(/:r,p):0 A; T h e : (p[a as (p)/a]) ; p' 

; (dcon) -. (roll) 

A;rh/e: (/:r,p);p A; T h e : a as (p); p 

A; r h e : a ais (p) ; p' 



A;rhe: (p[a as (p)/a]);p' 



J (unroll) 



Figure 3.10: Typing rules for EL for syntactic values (top), type equivalence and 
lifting (2nd) and basic computations (bottom). 
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Viei..n-A; T h : r^; p A h (/i : ri, . . . , /„ : Tn, .) : , , 

(r) 



A;rh{/, = e, }ti:{/,:ra^^i;p 

A;rhei:{p};p^ Ah(Z:r2,p):0 A;rh 621x2;/ 
A;rhei (g) {/ = e2}:{/:T2,p};p 



(r/ext) 



A;rhe:{/:r,p};/ ^ A; T h e : {/ : r, p} ; ^ 

7(r/sub) -. — (select) 

A;rhe /: {p};p'^ ^ ^ A:T ^ e.l : t:. p' ^ ' 



A;rhei : (pi) -^t;p' A h (Z : ti, pi) : A; T, x ^ n h 62 : r; p 

(c /ext) 



A; r h ei e { Z X =^ 62 } : (/ : Ti, pi) ^ r; p' 
A;rhei:(i:T',pi>^T;p' 



A;rhei e Z:(pi)-^t;p' 



(c/sub) 



p' 



A;rhei :(p);p' A; T h 62 : (p) ^ r; p' 

-. (match) 

A; r h match ei with 62 : r; p 



A;rhe:(p);p Ahr:^ 

(raise) 

A; r h raise e : t; p 

A; r h ei : r; Z : r', p A; F, a; h- ^ r' h 62 : r; p 
A; r h ei handle {I x ^ 62} : r;p 



(handle) 



A;rhe: r;p Ah(/:T',p):0 

^ — (unhandle) 



A; r h e unhandle I : t;1 : r ,p 
A; r h ei : r; Z : r', p A; T, x 1-^ r' h 62 : r; Z : r", p 



A; r h ei rehandle {I x 62} : r;l : r", p 

A;rhei:r;p' A; T, x (p') h 62 : r; p 



(rehandle) 



A; r h ei handle { a; =^ 62 } : r; p 



(handle- all) 



0; Fq h e : int; . 

(program) 

Tq\- e program 



Figure 3.11: Typing rules for EL for for computations involving records (top), cases 
(2nd) and exceptions (bottom). The judgment for whole programs is shown in the 
framed box. 
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Tl Ti To ~ To p ~ p' / / 



ap^ a int fti int ri T2 ~ t( T'2 {p} ~ {p'} (p) ~ (p') 

PI p'^ T Ri t' P2 p'2 



P ~ P T 

P2 / l \ P'^ 



a as, {p) ^ a as {p) (pi) r ^ (p'l) ^ t' ~ ■ ~ ■ I : r, P ^ I : r, f3 

7^ is a permutation of 1,. . . ,k 
I : T,.Ri I : T,. h-n,---,h-'Tk^P ^ %(1) : T-^(l), • • • ,%(A;) : 

Figure 3.12: The reordering judgment ~. 

Rules unrelated to exceptions simply propagate a single exception type without 
change. This is true even for expressions that have more than one sub-term, matching 
our intuition that the exception type characterizes the exception context. For exam- 
ple, consider function application e e': The rules do not use any form of sub-typing 
to express that the set of exceptions is the union of the three sets corresponding to 
e, e', and the actual application. We rely on polymorphism to collect exception in- 
formation across multiple sub-terms. As usual, polymorphism is introduced by the 
let/val rule for expressions let x — ei'va. 62 where ei is a syntactic value. 

The rules for handling and raising exceptions establish bridges between ordinary 
types and handler types (i.e., types of exception handler contexts). Exceptions them- 
selves are simply values of sum type; the raise expression passes such values to an 
appropriate handler. Notice that the corresponding rule equates the row type of the 
sum with the row type of the exception context; there is no implicit subsumption 
here. Instead, subsumption takes place where the exception payload is injected into 
the corresponding sum type (dcon). 

Rule HANDLE- ALL is the inverse of RAISE. The form ei handle { x =^ 62 } es- 
tablishes a handler that catches any exception emanating from ei. The exception 
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is made available to as a value of sum type bound to variable x. Operationally 
this corresponds to replacing the current exception handler context with a brand-new 
one, tailor-made to fit the needs of e\. The other three constructs do not replace 
the exception handler context wholesale but adjust it incrementally: handle adds a 
new field to the context while retaining all other fields; rehandle replaces an existing 
handler at a specific label / with a new (potentially differently typed) handler at the 
same /; unhandle removes an existing handler. There are strong parallels between 
c/ext (case extension) and handle, although there are also some significant differ- 
ences due to the fact that exception handlers constitute a hidden part of the context 
while cases are first-class values. 

Whole programs are closed up to some initial basis environment Fq, raise no 
exceptions, and evaluate to int. This is expressed by a judgment Tq h e program. 

3.3.4 Properties of EL 

The rule for the "handle-all" construct e\ handle { a; =^ 62 } stands out because it is 
non-deterministic. Since we represent each handled exception constructor separately, 
the rule must guess the relevant set of constructors {/i, . . . Introducing non- 

determinism here might seem worrisome, but we can justify it by observing that 
different guesses never lead to different outcomes: 
Lemma 3.3.1 (Uniqueness) 

If (e, {}) ^* (v, {}) and (e, {}) ^* {y' , {}), then v = v' . 

Proof: By a bi-simulation between configurations, where two configurations are re- 
lated if they are identical up to records. Records may have different sets of labels, 
but common fields must themselves be related. It is easy to see that each step of the 
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operational semantics preserves this relation. 



However, guessing too few or too many labels can get the program stuck. Fortu- 
nately, for well-typed programs there always exists a good choice. The correct choice 
can be made deterministically by taking the result of type inference into account, 
giving rise to a type soundness theorem for EL. Type soundness is expressed in terms 
of a well-formedness condition h (£'[e], -Eexn) wf on configurations. Along with the 
well-formedness of a configuration, we define typing rules for a full context {E, -Eexn) 
of r given a reducible configuration (i?[r], £^exn) in Figure [3.13[ 
Definition 3.3.2 (Well-formedness of a configuration) 



Then, we can prove type soundness using the standard technique of preservation 
and progress. Before we can proceed to establishing them, we need a few technical 
lemmas. Some of them are standard: inversion, cannonical forms, substitution and 
weakening. 

Lemma 3.3.3 (Cannonical forms) 

1. ifv is a value of type int, then v = n. 

2. if V is a value of type t~i T2, then v = fun f x = e. 

3. if V is a value of type { : Tj then v = {li = Vi j^^i- 

4. if V is a value of type {p), then v = I v' . 



0; Fo h e : r;p 
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h{E,Ee^n):r;p 



{E, Eexn) : {I ■■ r, p'} ; p h p' : 



^([],{}):int;. h []], Eexn) : r; p 



0; To hv : r A r' h p' : 



h (E, Eexn) : r; p 0;rohe:/;p ^ (E, Eexn) : t'; p 

h{E[[]e], Ee^n) : t' ^ r; p h {E[v []], Ee^n) : r; p 

\- {E,Ee^n) ■■r';p 0; Tp, a; : r h e : /; p h {E, Ee^n) : r; p 

h (£;[let X = [] in e], £;exn) : r; p h (£;[[] J], ^Jexn) : {/ : r, p'} ; p 

h (.E, .gexn) : {k ■ n, p'} ; p 0; Tp h {. . . , = Ij+i = e^+i, . . .} : {p^} ; p 

h (£;[{. . . , = k+i = ej+i, ...}], -Bexn) : r^; p 

h {E, Ee^n) ■.{1:t,p'};p 0; Tp h e : r; p h {E, Ee^n) : {p'} ; P 



h (£;[[] {Z = e}], £;exn) : {p'} ; P ^ {E[[] Z], £;ex„) : : T, p'} ; p 

h (■£;, gexnj : |< : r; p I ; p 0; Tp hy ^; : |p | 

^{E[v ® {Z = []}],£;exn):T;p h (£;[[] Z],£;exn) : {l : t',p;> A r; p 

h (.5,^6x11) : (/ : n,Pl) ^ ^;P^ 0;rp,3: : ri h e : r;p 
h (E[[] © { / X ^ e }], ^exn) : (pi) ^ r; p' 

^ ^exn) : t'; p' 0; Tp h e : (p) ^ /; p^ £^exn : P 

h (£;[match [] with e], £;exn) : (p) ; p' ^ {E[vsise []], £;exn) : (p) ; P 

^ (^,£^exn) :r;p^ 0;rphvi;:(p) 



h (£;[match ^; with []], £;exn) : (p) ^ r; p' h (£;[restore ^/^^ []]> -^^exn) : t; p 



'"p Eexn '■ P 



Vz. h(E„{}):r,;. 



l-p {} : ■ \-p {k -E^i}i=l...n • ^1 '■ ■ ■ ■ ,ln ■ Tn 



Figure 3.13: Given a reducible configuration (£^[r], £^exn), Typing rules for a full 
context of r. 
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5. if V is a value of type (pi) ^ r, then v — {li Xi ei for some n. 
Proof: By induction of r with the inversion lemma. 



Lemma 3.3.4 (Substitution) 

If 0; Tq, X :\/a : k.t' \- e : t; p and 0,a : k,;Tq \- v : r'; p, then 0; Tq h e[v/x] : r; p 
Proof: By induction on e. ■ 

Lemma 3.3.5 (WeaJcening) 

1. If 0; Fq h e : r; p and x ^ Dom(ro), then 0; Fq, x : t' \- e : t; p 

2. If 0; Fq h e : r; p, then ai : ki, . . . ,an '■ Hn', Tq \- e : t; p 

Proof: By induction on e. ■ 

In addition to the standard lemmas, we establish two special lemmas to simplify 
the main lemma: 
Lemma 3.3.6 (Restore) 

1. If\- {E,Eexn) ■ r']p and 0; Tq, a; : r h e : t';p, 
then hp {/ = E[let x = restore Eejcn i] '^^ ^]} '■ ^ '■ 

2. Ifh {E, ^exn) : r'; p and 0; Tq, a; : ( : he:/; p, 

then hp [li^ E[let x^k (restore e^^^ D }"=! " h-n,---,ln- Tn- 

Proof: By typing rules for a full context. ■ 



Lemma 3.3.7 (Exception context) 

Jf h {E, Eexn) ■ t; p, then hp Eexn : P- 
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Proof: By induction on E. ■ 

Given these we can show preservation: 
Lemma 3.3.8 (Preservation) 

If h {E[e],Ee^n]) wf and (^[e],Eexn) ^ {E'[e'], E'^^^) , then h {E'[e% E'^^^) wf 

Proof: The proof proceeds by case analysis according to the derivation of {E [e] , -Eexn) 
{E'[e'], -Eexn)- The cases are entirely standard except that some cases use Lemma [3.3.7l 
and Lemma I3.3.6[ We present such a case for example. 

• Case HANDLE: {E[ei handle {I x ^ 62}], -Eexn) ^ (-E [restore Eexn '^l]' -^exn)' 
By given, h {E[ei handle {I x ^ 62 }],-Eexn) wf. Then, by Definition 13.3.21 
we know that 0; Fq h ei handle {I x ^ 62} '■ t; p and h {E, -Eexn) : t; p 
By inv of HANDLE, 0; Fq h ei : r; / : r' , p ((|)) and 0;Fo,2; : r' h 62 : r; p 
((5)). TS: h (.Efrestore Eexn ^l]'-^exn) Then, it is sufficient to show that 
(STS): h (^[restore ^^^^ []],^exn) ■ '^'J ■ because of @. Then, with (3), 
STS: hp -Eexn ■ ^ • '^'7P- By exception context lemma, (3) also shows that 
hp -Eexn : P- Because ^exn = -^exn ® {/ = -E[let x = restore Ee^^ in 62]}, 
we only need to show that \-p {/ = £'[let x = restore Eeyin l] ^2]} • ^ • 
which is true by restore lemma with (3) and (§). 



To prove progress, we need the unique decomposition lemma: 
Lemma 3.3.9 (Unique decomposition) 

Let e be a closed term but not a value. Then, there exist unique E and redex r such 
that e = E[r]. 
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Proof: By definition of E. ■ 

Given this lemma, we can show progress: 
Lemma 3.3.10 (Progress) 

If a configuration (e, -Eexn) i^ well-formed, either it is a final configuration {v, {}) or 
else there exists a single-step transition to another configuration, i.e, {E[e'], -Eexn) ^ 
{E"[e"],E'^^^) where e = E[e']. 

Proof: For value terms, they are immediately final configurations by definition. For 
non- value terms, there exist unique E and e' such that e = E[e'] by Lemma [3.3.9I 
Then, the proof proceeds by case analysis on e'. ■ 

The main result is the type soundness (i.e., safety) of the EL programs: 
Theorem 3.3.11 (Type soundness) 

If a configuration is well- formed, either it is a final configuration or eles there exists 
a single-step transition to another well-formed conhguration. 

Proof: Type soundness follows from the preservation and progress lemmas. ■ 



Corollary 3.3.12 (Type safe exception handling) 

Well-typed EL programs do not have uncaught exceptions. 

Proof: By Theorem 13^111 
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3.4 The Internal Language (IL) 

EL expressions can be translated into expressions of a variant of System F with records 
and named functions. We call this language IL. Recall that the semantics for EL shown 
in Figure 13.91 uses non-determinism in its HANDLE ALL rule. The need for this arises 
because with ei handle x => 62 a. new exception context with one field for every 
exception that ei might raise must be built. This set of exceptions is not always fixed 
and does not only depend on ei itself: exceptions can be passed in, either directly 
as first-class values or perhaps by a way of functional parameters to higher-order 
functions. Therefore, to remove the non-determinism a combination of static analysis 
and runtime techniques is needed. 

In essence, we need access to the type of ei, and we must be able to utilize this 
type when building a new exception context. To make this idea precise, we provide 
an elaboration semantics for EL. We define an explicitly typed internal language 
IL and augment the EL typing judgments with a translation component. IL is a 
variant of System F enriched with extensible records as well as a special type-sensitive 
reify construct which provides the "canonical" translation from functions on sums 
to records of functions. Using reify we are able to give a deterministic account of 
"catch-all" exception handlers. 

Unlike EL, IL does not have dedicated mechanisms for raising and handling ex- 
ceptions. Therefore, we will use continuation passing style and represent exception 
contexts explicitly as extensible records of continuations. In EL, exceptions are simply 
members of a sum type, and the translation treats them as such: they are translated 
via dual transformation into polymorphic functions on records of functions. There- 
fore, they are applicable to both exception contexts (i.e., records of continuations) 
and to first-class cases (i.e., records of ordinary functions). 
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Terms e ::= n \ x \ \x : r.e | Aa : n.e | ei 62 | e[9\ \ let a; : r = ei in 62 | 

letrec f : f = Xx : T2.ei in 62 | letrec f : f = Aa : n.ei in 62 | 
{k = ^i }i=i I ei (g) {/ = 62} I e / I e./ I reify [p][f] e 
Values V ::= n | Ax : f.e | Aa : n.e | { = }^=l 
Types f ::= a | int | ^ f2 | {p} \ Va : k.t \ a sls f 
p ::= a\.\l:f,p\a^f 
e ::= f\p 

Figure 3.14: Internal language (IL) syntax. 

3.4.1 Syntax and semantics 

Figure 13.141 shows the syntax for IL. We use meta- variables such as e, f, and p 
for terms and types of IL to visually distinguish them from their EL counterparts 
e, r, and p. The term language consists of constants (n), variables (x), term- and 
type abstractions (Ax : f.e and Aa : K.e), term- and type applications (ei 62 and 
e[^]), recursive bindings for abstractions (letrec), let-bindings, records — including 
constructs for creation {/ = e}, extension 0, field deletion 0, and projection e.l — 
as well as the aforementioned reify operation which turns functions on sums into 
corresponding records of functions. IL types consist of ordinary types f and row types 
p. Ordinary types include base types (int), function types {ji — > f2), records ({p}), 
polymorphic types (Va : K.f), recursive types (a as f) and (appropriately kinded) 
type variables a. The set of type variables and their kinds is shared between EL and 
IL. Row types are either the empty row (.), a typed label followed by another row 
type (/ : f,p), a row type variable (a) or a row arrow applied to a row type variable 
and a type (a ^ f). The key difference between the row types of EL and IL is the 
inclusion of such row arrows. They are critical to represent sums and cases in terms 
of records. As usual, well-formedness of potentially open type terms is stated relative 
to a kinding environment A mapping type variables to their kinds, so judgments have 
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the form A h r : k. For brevity we omit rules because they are either standard or 
closely follow the ones we used for EL (see Figure [3771) . 

A small-step operational semantics for IL is shown in Figure 13. 161 With the 
exception of reify, most rules are standard. There are three definitions of substitution 
rules for free variables (Figure [3. 171) and for free type variables (Figure [3. 181 and l3.19p . 
For example, let p = li : fi, ... ,lri '■ "Tn, • and consider (a ^ f)[p/a]. Substitution 
cannot simply replace a with p, since the result would not even be syntactically 
valid. Instead, it must normalize, resulting in li : (fi — f'),...,ln '■ ijn r'),. 
where f' = f[p/a]. 

Figure [3. 201 shows typing rules for IL which are mostly standard with the exception 
of reify. The rule for type application involves type substitution, and, as before, 
we must use a row-normalizing version of substitution. A formal definition of row 
normalization as a judgment is shown in Figure 13.211 



E []\E e\v E \ E [f] \ let x:f^E in e2\E_^ {I ^62} I v 

{..., li-i = Vi-i, li = E, li^i = e^+i, ■■■} \ E Z I 

Figure 3.15: Evaluation contexts for IL. 



{l^E} 



E[{Xx : f.e) v] 
E[{Aa : K.e) [f]] 
_E'[let X : f = V \ne\ 
£'[letrec f : f = \x : f2.ei in 62] 
^[letrec f : f = Aa : K.ei in 62] 
E[{ k = V, ® {1 = v}] 
E[{...,k = v,,...} /,] 
E[{...,li = v^,...].k] 

^[reifyf^i : fi, . . . ,ln : fn, •] [f] v] 



E[e [v/x\] 
E[e[f/a]] 
E\e\v/x\] 

E\e2 [{\x : f2.ei [(letrec f : f = \x : f2.ei in /)//])//]] 
E[e2 [{Aa : K.ei [(letrec f : f = Aa : K.ei in /)//])//]] 

^[{^1 =Vi,...,ln = Vn,l= v}] 
E[{..., = Vi_i, lij^i = Vi+i, . . .}] 



(app) 

(type/app) 
(let) 

(rec/fun) 

(polyrec/fun) 

(r/ext) 

(r/sub) 

(select) 



li — Xxi : Ti-v {Aa : ★.Ac ■ {ij ■ rj a j^^ycli xi) | ^ (reify) 



Figure 3.16: Operational semantics for IL. 
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Tl 


[7? / t1 




Tl 


X 


[■u/xl 

L / J 


— 


V 


y 


[■u/x] 


— 


yifx^y 


{Xx : f.e) 


[■u/x] 


— 


Xx : f.e 


(Ay : f.e) 


[^/^] 


— 


Xy : f.(e [v/x]) if x^y.y^ Y\{v) 


(Act : K.e) 


[■u/x] 


— 


Act : K.{e [v/x]) 


(ei 62) 


[■u/x] 


= 


(ei [v/x]) (62 [v/x]) 


m) 


[■u/x] 


= 


{e[v/x])[e] 


f = Xx : f2.ei in 62) 


[v/f] 


= 


letrec f : f = Xx : f2-e\ in 62 


f = Ax : f2.ex in 62) 


[v/x] 


= 


letrec f : f = {Xx : f2.ei) [v/x] in ( 


f = Aa : K.ei in 62) 


[v/f] 




letrec f : f = Aa : K.ei in 62 


f = A« : K.ei in 62) 


[v/x] 




letrec f : f = Aa : K.{ei [v/x]) in ( 


(let x : f = ei in 62) 


[v/x] 




let X : f = {ei [v / x\) in 62 


(let y : f = ei in 62) 


[v/x] 




let X : f = {ei [v/x] in (e2 [v/x]) if 


({ = 


[v/x] 




{k = ei [v/x] 


(ei ® {/ = 62}) 


[v/x] 




(ei [v/x]) (g) {/ = 62 [v/x]] 


(e /) 


[v/x] 




(e [t;/a;]) / 


(ei) 


[v/x] 




(e [w/x])./ 


(reify [p][f] e) 


[v/x] 




reify [p][f] (e [i;/x]) 



Figure 3.17: Substituting v for free variable x, e [v/x]. 





n 


[(7 / CiJ 


— n 




X 




= X 




( \ ^ ' T"^ \ 

1 AX . / .C I 


It/ / ix\ 


— \t ■ t' \eln\ (p \ein\\ 




(Aa : K.e) 


[e/a] 


— Aa : K.e 




(A/3 : K.e) 


[9 /a] 


= A/3 : K.{e [e/a]) if a 7^ /3,/3 ^ FTV(^) 




(ei 62) 


[9/a] 


= (ei [e/a]) (62 [e/a]) 




W\) 


[e/a] 


= ie[0/a])[e'] 


(letrec / : 


T\ = \x : T2.ei in 62) 


[e/a] 


= letrec / : n [e/a] = Xx : f2 [e/a].{ei [e/a]) in (e2 [e/a]) 


(letrec / 


: f' — Aa : K.e\ in 62) 


[e/a] 


— letrec f : f' [e/a] — Aa : K.ei in (e2 [^/a]) 


(letrec / 


: f' — A/3 : K.e\ in 62) 
(let X : f' = ei in 62) 


[e/a] 


= letrec / : f [^/a] = A/3 : /«.(ei [^/a]) in (e2 [^/a]) if a 7^ /3,/3 ^ FTV(^) 
= let X : f' [e/a] = (ei [e/a]) in (e2 [^/a]) 




[e/a] 




({ = e,. }ILi) 


[e/a] 


= { = e, [^/a] }"_^ 




(ei ® {/ = 62}) 


[e/a] 


= (ei [e/a]) ® {/= 62 [^/a]} 




(e /) 


[e/a] 


= (e [^/a]) / 




(eJ) 


[e/a] 


= le[e/a]).l 




(reify [p][f'] e) 


[e/a] 


= reify [p [^7a]][f [^"/a]] {e[e/a]) 



Figure 3.18: Substituting e for free type variable a, e [^/a]. 



a [9/a 
(3 [eja 
int [O/a 
(fi f2) [eja 
(Va : K.f) [6 /a. 
{Wp : K.f) [9/ a 

{P} Wa 

a as T [9/a 

P as f [9/a 

. [9/a 

{l:f,p) [9/a 

{(5 ^ f ) [9/a 
{a ^ r) [./a 

(a ^ f ) [/9/q; 
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13 if a ^13 
int 

(n [^7«]) ^ (f2 [^7«]) 

Va : K.r 

V/? :_fi;.(f [^/a]) if a 7^ ^ FTV(^) 
{-p [9/a]] 
a as T 

(3 as {f[9/a]) ii a^(3 

I : f' [9/a],p [9 /a] 

P ^ [^/«]) if a ^ /3 

f3 ^ ^f' [p/a]) 



■ I : Ti ^ t' ,{a t) [p/a] where r' = t[1 : ti, 
Figure 3.19: Substituting 9 for free type variable a, 9' [9/a]. 



3.4.2 Properties of IL 
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To prove type soundness, we need some standard lemmas such as substitution and 

canonical lemmas: 

Lemma 3.4.1 (Substitution) 

If 0] 0, X : f' \- e : f and 0; \- v : f' , then 0; h e[v/x] : f. 

Proof: By induction of a derivation of 0; 0, x : r' h e : f. ■ 
Lemma 3.4.2 (Type substitution) 

If 0,a : K,;0 \- e : f and \- 9 : k, then 0; h e[9/a] : f[e/a]. 

Proof: By induction of a derivation of 0, a : /t; h e : f. Similar to the proof of 
lemma I3.4.1I ■ 

Lemma 3.4.3 (Canonical forms) 

1. if V is a value of type int, then v = n. 

2. if V is a value of type fi — > T2, then v = Xx : fi.e. 

3. if V is a value of type Va : n.f, then v = Aa : n.e. 

4. if V is a value of type {p}, then v = {li = Vi }^^q for some n. 

Proof: By induction of f with the inversion lemma. ■ 

We can prove type soundness using the standard technique of preservation and 
progress: 

Lemma 3.4.4 (Preservation) 

If 0; \- e : f and E[e] h-> E[e'], then 0; h e' : f . 
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r(a:) = Vai : ki . . .\/an ■ Hn-r' 

,^ . VigL.n. Al-fi-.Ki f = f'[fi/ai, . . .,fn/an] , 
■ (T-int) z (T-var) 



A;ri-n:int A; T h x : f 



A;r,x:f'he:f . A; T h ei : r2 ^ r A; T h 62 : r2 

■(T-ABS) — z (T-APP) 



A;T \- Xx : r'.e : t' ^ t A; F h ei 62 : t 

A,a:K;r\-e:f ,^ , , A; f h e : Va : «.f A h f' : k 

(T-abs/type) — --y^ — _ (T-app/type) 



A; r h Act : K.e : Va : K.f ^ ' ' A; T h e[f'] : f[f'/a\ 

A; r h ei : f A; F, x : f h 62 : t2 



A; r h let a; : r = ei in 62 : T2 

A; r, / : f2 ^ fi, a; : f2 h ei : fi A; T, / : 7=2 ^ n, x : f2 h 62 : f 



A; r h letrec f : T2 ri = Xx : T2.ei in 62 : r 

A; r, / : Va : K.fi h ei : fi A; F, / : Va : K.fi h 62 : t 
A; f h letrec f •.\/a: K.fi — Act : K.ei in 62 : f 



(T-let) 

(T-LETREC) 



(T-letrec/type) 



A; r h e : (pa as p/q;) , , A; The: a asp , , 
- ri_Ji_ (roll) ' _ — — ^— — (unroll) 

A; r h e : a as p A; F h e : {p[a as p/a\) 

\/i.A; r h e, : , A; f h e : {/ : f, p} 

' — (T-R) A , _'^^ T-SELECT) 



A;f h { = : { : AjT h eJ : f 

A; f h ei : {p} A h (Z : f2, p) : A; f h 62 : f2 
A;rhei (g) {^ = 62} :f2,p} 



(T-r/ext) 



A;r he :{/: f,p} , , A; T h e : p W f 

i ^ T-r/sUB ■ ^ ^/'^'^ ^ -(T-REIFY 

A; r h e / : {p} ^ ^ ^ A; T h reify[p][f] e : {p f} ^ ^ 

Figure 3.20: The static semantics for IL. 



p; T2 ► p' 



Figure 3.21: Row arrow normalization. 
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Proof: The proof proceeds by case analysis according to the derivation of E[e] 
E[e']. The cases are entirely standard except for the reify expression. We present 
only this. 

• Case e — reify [Zi : fi, ... ,ln '■ fm^[f'] v and e' — {li — X xi : f^.v (A a : 
-k.Xc : {Ij : fj a Y^^yck Xi)}'^^^ By given, 0; h reify[/i : n, . . . , : 
TVi, .][f'] V : T where f = {p ^ f'} = {li : fi ^ f' , . . . ,ln ■ fn ^ f'}- By inv 
of T-REIFY, 0;0 h ^; : ^ f (®). TS: 0; \- {k ^ X Xi : fi.v (A a : 
★.Ac '■ {ij '■ fj a }^^yC.li Xi)}'^^^ : r. By inv of T-R and T-ABS, 

STS: Vi.0;0,Xj : fi \- v (Act : T*r.Ac : {Ij : fj Oi}''j_-^.c.li Xi) : r'. By inv 
of T-APP, STS: yi.0;0,Xi : h v : ^ f which is true by ®) and 

Vi.0; 0,Xi : fj h Aa : -k.Xc '■ {ij '■ fj — ct }J_]^-c./j a;^ : (which is provable 
by typing rules). 

■ 

Lemma 3.4.5 (Progress) 

If 0; \- e : f, then either e is a value or else there is some e! with e i— > E[e'] where 
e. — E[f\ and f is a redex. 

Proof: By induction of a derivation of 0; h e : f. The cases are entirely standard 
except for the reify expression. We present only this. 

• Case e — reify [p][f] ei. 

By given, 0; h reify [p][f] ei : {p^f}. By inv of T- reify, 0; h ei : 
(I P ^ f- Because of its type, ei should be a function, which is a value. Then, 
done by reify. 
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The main result is the type soundness of the IL programs: 
Theorem 3.4.6 (Type soundness) 

If 0; h e : f, either e is a value or else there is some e' with e ^ e' where 
0; h e' : f . 

Proof: Type soundness follows from the preservation and progress lemmas. ■ 



3.4.3 From EL to IL 

The translation from EL into IL is somewhat involved because it performs two trans- 



form ations at once: (1) a transformation into continuation-passing style (CPS) (lAppel 



19921 ). and (2) a dual translation that eliminates sums and cases in favor of records 
of functions and polymorphic functions on such records. 

There are two translation judgments: one for syntactic values, and one for all 
expressions. The judgment for a syntactic value e has the form A; F h 
Notice the absence of exception types. Since e is a value, its IL counterpart e requires 
neither continuation nor handler. For non-values there is no derivation for a hy 
judgment. 

The IL counterpart for non- values is a computation. Computations are suspensions 
that await a continuation and a handler record. Once continuation and handlers are 
supplied, a computation will run until a final answer is produced and the program 
terminates. The translation of an expression e to its computation counterpart is 
expressed by a judgment of the form A; F h e : r; p c : (f, p) comp where c is the IL 
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P 



ans = int 



r 



p such that p]T >■ p' 




T cont p hdir — > ans 
f (f', p) comp 

{p' ^ comp} 

Wa : -k. {p ^ a} ^ a 




P 



T cont = r 



ans 



r 



phdir = 



{p ^ ans} 




Figure 3.22: Type synonyms for IL types. 



term representing the computation denoted by e. The type of c is always (r, p) comp 
where r and p are the IL counterparts of r and p. 

Notation: To talk about continuations, handlers, and computations, it is con- 
venient to introduce some notational shorthands (see Figure 15.221) . We write ans for 
the type of the final answer, f cont for the type of continuations accepting values of 
type f, p hdIr for the type of exception handlers, i.e., records of continuations whose 
argument types are described by p, and (t,p) comp for the type of computations 
awaiting a r cont and a p hdlr. The CPS-converted IL equivalent of an EL function 
type is f\ — i> p. It describes functions from f\ to {j2-,P) comp. Similarly, the type 
(p) =^ f is the IL encoding of a first-class cases type, i.e., a record of functions that 
produce computations of type (r, p') comp. Finally, (|p^ is the dual encoding of a 
sum: the polymorphic type of functions from records of functions to their common 
co-domain. 

Notice that most of the type synonyms in Figure 13.221 make use of the notation 
p r. It stands for the unique row type p' for which the rovj normalization judgment 
p; f ► p' holds (see Figure [3.2ip . Our presentation rehes on the convention that 
any direct or indirect use of the ^ shorthand in a rule introduces an implicit row 
normalization judgment to the premises of that rule. 

To improve the readability of the rules, we omit many "obvious" types from IL 
terms. For example, we write XkXh.e : {f,p) comp without the types for k and h, 
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since these types clearly can only be r cont and p hdir, respectively. 

Type translation: Figure [3.231 shows the translation of EL types to IL types. 
The use of type synonyms makes the presentation look straightforward. (But beware 
of implicit normalization judgments!) 

Value translation: Figure 13.241 shows the translation of syntactic values: con- 
stants, variables, functions, and cases. Constants are trivial while variables may 
produce type applications if their types are polymorphic. 

The transformation of functions depends on whether the body itself is a syntactic 
value or not. If the body e of function / is a value, then it is transformed as a value, 
i.e., using the hy judgment, into an IL term e. Then a recursively polymorphic 
CPS function is constructed. When instantiated and applied, it simply passes e to its 
continuation k' . Its exception handler h' is never used. Since the constructed function 
is polymorphic, it must be instantiated at p to form the final result. If the body e 
is a non- value, then rule fun/non-val applies and e is turned into a computation c 
that becomes the body of the constructed IL function. 

Cases are treated as a sequence of individual non-value functions that are not 
recursive. Each of these functions is translated and placed into the result record at 
the appropriate label. 

Basic computations: Figure 13.251 shows the translation of basic terms: injec- 
tion into sums, applications, and /ei-bindings. Also shown is rule value for lifting 
syntactic values into the domain of computations. From e (the result of translating 
value e) it constructs a computation term that passes e to its continuation k. The 
computation's exception handler h is never used, which is justification for leaving the 
exception type of syntactic values unspecified. 
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The computation representing / e, i.e., the creation of a sum value, first runs sub- 
computation c corresponding to e to obtain the intended "payload" x. The result 
that is sent to the continuation is a polymorphic function which receives a record r 
of other functions, selects I from r, and invokes the result with the x (the payload) as 
its argument. This is simply the dual encoding of sums as functions taking records 
as arguments. 

Application is simple: after running two sub-computations ci and C2 to obtain 
the callee xi and its intended argument X2, the callee is invoked with X2 to obtain 
the third and final computation. All three computations are invoked with the same 
handler argument. 

Non-value /et-bindings simply chain two computations together without altering 
any handlers. The translation of a polymorphic /ei-bindings invokes the value trans- 
lation judgment on the definien expression ei to obtain ei which is then turned into 
a polymorphic value via type abstraction. The constructed value is available to the 
sub-computation C2 representing the body 62. 

We omitted the rules for type equality, since they are somewhat tedious but 
straightforward. 

Computations involving records, cases and exceptions: The translations 
for records, cases and exception-related expressions are shown in Figure 13.261 13.271 
and 13.281 respectively. A match computation instantiates its sum argument (bound 
to xi) at computation type and applies it to the record of functions X2 representing 
the cases. The raise computation, on the other hand, instantiates the sum at type 
ans and applies it to h, i.e., the current record of exception handlers. It does not 
use its regular continuation /c, justifying the typing rule that leaves the result type 
unconstrained. 
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T f p ^ p p p 

a a int --^ int . ^ . I : t, p I : f, p (p) (I P ^ 

Tl f 1 T2 ^ T2 p p p' p' T f p p 

p - p - / t\ p p - 

Tl — > T2 Tl ^ T2 <^p)^T-^<^p)^r 

Figure 3.23: Translation of EL types to IL types. 

A case extension computation extends a record of functions representing cases, 
while the handle computation extends the record of (continuation-) functions rep- 
resenting handlers. The rules for unhandle and rehandle are similar to that for 
HANDLE: in the former case a field is dropped from the handler record, while in the 
latter a field is replaced. Similar operations exist for cases, but for brevity we have 
omitted them from the discussion. 

The HANDLE-ALL rule is the only rule introducing reify into its output term. It 
is used to build a new exception-handler record from p, which is the exception type 
of ei. Each field 1^ of this record receives the payload of exception Ij, injects it into 
and passes the result (as a binding to x) to the computation specified by 62. 

Properties of 

An important property of the translation is that it translates well-formed EL expres- 
sions to well-formed IL expressions. Before we proceed to establishing the correctness 
of -w, we set up a few helper lemmas: 
Lemma 3.4.7 (Type synonyms) 

1. If A;T \- Xk : f cont.A/i : p hdlr.e : (f , p) comp, then A; f , A; : f cent, h : p hdir h 
e : ans. 

2. If A; r,/i : p hdIr h c : (t,p) comp and A;r, /i : p hdIr \- c e h : ans, then 
A; f , /i : p hdIr h e : f cent. 
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r(x) = Vai : Ki . . . Wan : Kji-t' Vigi_,„. A h 6'j : 
r^T'[ei/ai,...,en/an] r f \/i^i,_n-Oi ^ 0i ^ 

VAR) _ _ ^ ^ (iNT 



A; r hv x : T -w a;[6'i] . . . [9n] -f A; T K n : int -w n : int 



A; r, / : Va : 0.T2 t, x : T2 \~\/ e : t e : f 

A h T"2 ■ * ~' V) A h : f> p 

=- = (fun / val) 

A; r hv fun / X = e : T2 — > T 



•w letrec f -.Wa : 0.T2 t = AaXxXk'Xh' .k'e in f[p] : T2 ^ t 

A; r, / : r2 r, a; : r2 h e : r; p c : (f , p) comp 
A h r2 : * T2 T2 A h p : 



A; r hv fun f X — e : T2 T letrec f '■ T2 t — Xx.c in / : r2 r 



(fun/non-val) 



Viel-.n- A; T, : h : r; p q : (r, p) comp 
A h (ii : Ti, . . . : Tn,.) : ^iel..n-n^n 

= — (c) 

A; r hv { ^ : ( : T, A r - { = Ax^.q j^i : ( : ^ ^ f 

Figure 3.24: The translation from EL to IL for syntactic values. 
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ArrhweiT-weiT A\- p : P P , n 

■ ^ , , , , _ — (value) 

A; r h e : t; p XkXh.k e : (r, p) comp 

A; r h e : r; p' c : (f, p') comp A h (Z : r, p) : p p 

^ ^ ; (DCON) 

A;rh le:{l:T,p)-p' ^ ' 

-w \k\h.c{\x.k{Ka\r.{r.l x))) h : : T,p\),p') comp 

A; r h ei : T2 r; p ~^ ci : (t2 f , p) comp 

A;rhe2 :t2;p-^C2 : (f2,p) comp 

— — — (app) 

A; r h ei 62 : t; p \k\h.ci{\xi.C2{\x2-Xi X2 k h) h) h : (r, p) comp 

{Q;i,...,Q;n} = FTV(Ti)\FTV(r) 

A, ai : Ki, . . . , an : «n; r hv ei : ri -w ei : fi 

A; r, a; : Vai : Ki . . . Va„ : K^.ri h 62 : T2; p C2 : (f2, p) comp 

(let/val) 

A; r h let X = ei in 62 : T2; p 

A/cA/i.let a; : Vcti : . . . Van : t^n-T'l = Aai . . . Aa^-ei 

in C2 /c /i : (t2,p) comp 

A; r h 61 : ry, p ci : (ri, p) comp 
A; r, X : ri h 62 : r2; p 62 : (r2, p) comp 

let/non-val) 



A; r h let X = 61 in 62 : T2;p AA;A/i.ei(Ax.e2 k h) h : (t2,p) comp 

A; r h c : (p[n as (p)/n]) : // r : ((|p[n as (|p|)/n] |). pQ comp 
A; r h 6 : a as (p); p' c : (a £is (I p' I), p') comp 

A;r h 6 : a as (p);p' e : (a as (|p'|),pO comp 
A;r h 6 : (p[a as (p)/a]) ;p' e : (^p[a as (|p|)/a] ^,p') comp 



(roll) 
unroll) 



Figure 3.25: The translation from EL to IL for basic computations. 
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ViGi..n-A;r h ei : Tf, p Ci : {Ti,p) comp 
A\- {li : Ti, . . . ,ln ■ rn) •■ {h : n, . . . Jn ■ Tn) --^ {h ■ n, ■ ■ ■ ,in ■ fn) 

A;rh {k^e,}^^,:{k:n}U;p ^""^ 

-w XkXh.k ({ k = Ci /i : ({ : fi p) comp 

A; r h ei : {p} ; p' ci : ({p}, p') comp 
A; r h 62 : r2; p' C2 : (r2, p') comp 
A h : r2, p) : (/ : T2, p) (/ : T2, p) , , x 

; (R/EXT) 

A;rhei ® {/ = e2}:{/:r2,p};p' 

•w XkXh.ci {Xvi : {p}.C2 (Av2 : T2.k {vi (g) {Z = ^2})) /i) h : {{I : T2,p},p) comp 
A;r h e :{/: r,p};p' c: ({/: f,p},p') comp 

-, ; (r/SUB) 

A; r h e l:{p};p-^ XkXh.c {Xv : {I : f, p}.k {v I)) h : ({p}, p ) comp 

A; r h e : {/ : r, p} ; p' c : ({/ : f , p}, p ) comp 

-. — —. (select) 

A; r h ei : r; p XkXh.c ( Ar : {/ : r, p}.(r./) k h) h : {t, p) comp 

Figure 3.26: The translation from EL to IL for computations involving records. 



A; r h ei : (pi) ^ r; p' ^ ci : ((pi) ^ r, p') comp A h (/ : ri, pi) : 

A; r, X : Ti h 62 : t;p-w C2 : (f,p) comp : n, pi) -w : fi, pi) 

(c/ext) 

A; r h ei © { / a; ^ 62 } : (/ : ri, pi) r; p 

~^ XkXh.ci{Xxi.k{xi {/ = Aa;.C2}))/i : ((/ : ti, pi) f , p') comp 

A; r h 61 :(/: r',pi) r;p' ci :((/: T,pi) f,p) comp 
^ ^ (c/sub) 

A;rh 61 e /: (pi) ^r;p' 

^ AA;A/i.ci(Axi.A;(a;i l))h : ((pi) 7^, p') comp 

A;rh6i : (p);p''-^ci : (^p|),p') comp 

A; r h 62 : (p) ^ r; p' w 62 : ((p) r, p ) comp 
A; r h match 61 with 62 : r; p 

AA;A/i.ei(Aa;i.e2(Aa;2.a;i[(f , p ) comp] X2 k h) h) h : {f,p) comp 



■ (match) 



Figure 3.27: The translation from EL to IL for computations involving cases. 
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A; r h e : (p) ; p c : ({] pD, p) comp A h r : ★ t t 

— — - — — — (raise) 

A; r h raise e : r; p Xk\h.c[Xx. x[ans\ h) h : (r, p) comp 

A; r h ei : r; I : t' , p ci : (f , (/ : f p)) comp 

A; r, a; : r' h 62 : r; p -w C2 : (r, p) comp 

(handle) 

A; r h ei handle { / a; ^ 62 } : r; p 

XkXh.ci k {h ® {I = Xx.C2 k h}) : (r,p) comp 

A; r h e : r; p c: [f.p) comp A h (/ : r', p) : t' f' p p 

'■ (unhandle) 

A; r h e unhandle / : r; I : t ^ p XkXh.c k {h I) : {t,1 : t , p) comp 



A; r h ei : r; / : r', p -w d : (r, (/ : r', p)) comp 
A; r, x : r' h 62 : r; / : r", p ~^ C2 : (f , (/ : f", p)) comp 

A; r h ei rehandle {I x 62} : t;1 : r", p 

XkXh.ci k {{h (Z) I) <Si {I — Xx.C2 k h}) : (f, (Z : f" ,p)) comp 



(rehandle) 



A; r h ei : r; p' ci : (r, p') comp 

A; r, x : (p') h 62 : r; p C2 : (f , p) comp 

(handle- all) 

A; r h ei handle { a; =^ 62 } : r; p 

XkXh.ci k (reify[p'][ans] (Aa;.C2 k h)) : {f,p) comp 



0; Fn h e : int; . c : (int. .) comp , 

— — _ ^ ' PROGRAM 

Fq r e program c (Ax.xj {} : ans 



Figure 3.28: The translation from EL to IL for computations involving exceptions. 
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3. If A.;T,k : f cont h c : (t,p) comp and A;r,A; : f cont h c k e : ans, tien 
A; r, /c : f cont h e : p hdlr. 

Proof: By defintion of (f, p) comp which is f cont ^ p hdlr ^ ans and by the typing 
rule of T-ABS and T-APP. ■ 

Lemma 3.4.8 (Weakening-A; T) 

If A; f h c : (f, p) comp, then A'; f ' h c : (f, p) comp for aii f ' and A' such that 
f' D f and A' D A. 

Proof: By induction of a derivation of A; f h c : (f, p) comp. ■ 
Definition 3.4.9 (Translation of environments) 



C(0) = 

c(r,x^(7) - c{r),x^c{a) 

C(r) = f where t f 

C(ya : K.o") = Va : «:.C((j) 

C(A,q;i-^«;) = C(A),q;i-^k 

Lemma 3.4.10 (Translation of F) 

IfAhriK, then C(A) h f : 

Proof: By induction of a derivation of A h r : 



Lemma 3.4.11 (Substitution) 

Ifr — t'[ti/q;i, . . . , Tfi/oin] and r f, then f — f'[fi/ai, . . . , fn/cxn] where t' f' 
and VngL.^.Tn -w f^. 
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Proof: By induction of r. ■ 

These lemmas allow us to prove correctness of -w: 
Lemma 3.4.12 (Correctness of translation 

If A; r h e : r; p -w c : (f , p) comp and T D C{T) and A D C(A), then A; T h c : 
(r,p) comp. 

Proof: By induction of a derivation of A; F h e : r; p -w c : (r, p) comp. At each 
step of induction, we assume that the desired property holds for all subderivations 
and proceed by case on the possible shape of e to show that A; F h c : (r,p) comp. 
By Lemma [3.4.71 it is sufficient to show that (STS) A; F, A; : f cent, h : p hdir h e : ans 
where c = Xk : f cont.A/i : p hdlr.e. Then, proofs are straightforward. We present the 
case HANDLE / ALL for example. 

• Case e = ei handle {x => 62} and e = ci k (reify [p'] [ans] (Ax.C2 k h)). 

STS: A;T,k:f cent, h : p hdIr h ci A; (reify [p'] [ans] (Ax : {\p' |) .02 k h)). By IH 
for ei and lemma[3X71 STS: A; f , : _, /i : _ h reify [p'] [ans] (Xx : (| p' |).C2 k h) : 
p' hdIr (which is true by T- reify). 



3.5 Untyped A-Calculus with records (LRec) 

IL expressions are translated into expressions of a variant of an untyped language, 
called LRec, which is closer to machine code. Its essence is that records are represented 
as vectors with slots that are addressed numerically. Therefore, the labels in every row 
are mapped to indices that form an initial segment of the natural numbers. Individual 
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labels are assigned to slots in increasing order, relying on an arbitrary but fixed total 
order on the set of labels. 

The LRec language extends the untyped A-calculus with (n-ary) tuples and named 
functions; Figure [3.291 shows the abstract syntax for LRec. The terms of the language, 
denoted by e, consist of numbers n, variables x, the operations plus and minus, 
len(e) for determining the number of fields in a tuple e, named functions, function 
application, and introduction and eliminations forms for tuples. The introduction 
form for tuples, ( Sj specifies a sequence of slices from which the tuple is being 
constructed. The elimination form for tuples is selection (projection), written 6^.62, 
that projects out the field with index 62 from the tuple e^. The terms include a let 
expression (as syntactic sugar for application) and a simple conditional expression 
ifzero (e, e, e). A slice, denoted by s, is either a term, or a triple of terms (6^,62, 63), 
where yields a record while £2 and 63 must evaluate to numbers. A slice (e^, £2, £3) 
specifies consecutive fields of the record ei between the indices of £2 (including) and 
63 (excluding). 

Fi gure 13 . 3 1 1 shows the dynamic semantics for LRec. We enforce an order on evalua- 
tion by assuming that the premises are evaluated from left to right and top to bottom 
(in that order). The semantics is largely standard. The only interesting judgments 
concern evaluation of slices and construction of tuples. Slices evaluate to a sequence 
of values selected by the specified indices (if any). Tuple selection projects out the 
specified field with the specified index from the tuple. Since tuples can be imple- 
mented as arrays, selection can be implemented in constant time. Thus, if records 
can be transformed into tuples and record selection can be transformed into tuple 
selection, record operations can be implemented in constant time. The computation 
of the indices is the key component of the translation from IL to LRec. 



Terms e ::= n \ x \ ei + §2 \ ei — 62 \ len(e) | Xx.e | £2 | ( \ e.e \ 

let X — e-i in 62] letrec / = Xx.ei in £2 | ifzero (c]^, 62, £3) 

Slices s e\ (e, e, e) 

Values V ::— n\ {vi)f=i \ Xx.e 



Figure 3.29: The syntax for the LRec language. 



E []\Ee\vE\E + e\v + E\E-t\v-E\len{E)\letx^Eine\ifzero{E,e,e) 

E.e \v.E\ {... ,Vi_i,Es,Si+i, . . .) 
Es ::= [] 1^1 (^,e,e) | {v,E,e) \ {v,v,E) 



Figure 3.30: Evaluation contexts for LRec. 



E[{Xx.e) v] 


^ E\e[y_/x\] 


(app) 


E[ni + 77,2] 


I— E\n] where n = ni + n2 


(plus) 


E[ni - n2] 


^ E\n] where n = rii — n2 


(minus) 


E[\en{{vi,...,Vn))] 


^ E[n] 


(len) 


E_[let X — vin e\ 


1— > ^[e[t;/x]] 


(let) 


£|[letrec / — Xx.ei in §2] 


^ ^[e2[(Aa;.ei [(letrec / = Xx.e^ in /)//])//]] 


(rec/fun) 


^[ifzero (0,6^,62)] 


^ EM 


(ifzero /true) 


£^[ifzero (n, 61,62)] 


I— > E\e2\ where n 7^ 


(ifzero /false) 


E[{'M.l,---,m,---,Vn)-i] 


^ Hi 


(select) 


Es[v\ 


1— > 


(slice/singleton) 


..,Vi,...,Vj,...,Vn),i,3)\ 




(slice/sequence) 



Figure 3.31: Operational semantics for LRec. ^ 

CO 
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3.5.1 From IL to LRec 

Figure [3.331 shows the translation from IL into the LRec language. The translation 
takes place under an index context, denoted by S that maps row variables to sets 
consisting of label and term pairs: 

S ::= 0|S,/5^{(/„e,)}f=i 

Then, for a row variable j3, = {(^i,ej^), • • • , {hi,^n)} where Cj is the term that 

will aid in computing the index for in a record. Additionally, we define two auxiliary 
functions proj^(S, /5, /) for the index (term) of / for P and proj;(S,/3) for projecting 
out the labels from a row variable (3. 

proji(E,/3,/) = e if (/,e) G 
proj^S,/?) = {/ I (/,e)GS(/3)} 

The translation of numbers, variables, functions, applications, and let expressions 
are straightforward. A record is translated into a tuple of slices, each of which is 
obtained by translating the label expressions. The slices are sorted based on the 
corresponding labels. Since sorting can re-arrange the ordering of the fields, the 
transformation first evaluates the fields in their original order by binding them to 
variables and then constructs the tuple using those variables. 

A record selection is translated by computing the index for the label being pro- 
jected based on the type of the record. To compute indices for record labels, the 
translation relies on two functions pos and labels. Given a set of labels L and a label 
/, define the position of I in L, denoted pos(/,L), as the number of labels of L that 



are less than / in the total order defined on labels: 
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pos(/,L) = |{/' \l' eLAl' <i l}\ 

where . . . , ln}\ = n and <i denotes the ordering relation on labels. For a given 
record type {p}, define labels({p}) to be the pair consisting of the set of labels and 
the remainder row, which is either empty or a row variable. More precisely: 

labels({/i : fi,...,/^ : ffc,-}) = ({/i, . . . , /fc}, ■) 

labels({/i : fi,...,/^ : ffc,/5}) = ({/i, . . . , /fc}, /3) 

labels({/i : fi,...,/fc : ffc,/5 ^ f}) = ({/i, . . . , /fc}, /3) 

Notice that we treat f3 ^ t just like plain f3, taking advantage of the fact that 
(/3 ^ r) \ / if and only if /? \ /. 

Let p be some row type. We can compute the index of a label / in p, denoted 
indexOf (S, /, labels({p})), depending on labels({p}), as follows: 

indexOf(S,/, (L, ■)) = pos(/, L) 

indexOf(S,/,(L,/5)) = proji(S, /5, /) - pos(/, proj^S, /3) \ L) 

For example, the record extension ei {/ = 62} is translated by first finding the 
index of / in the tuple corresponding to ei, then splitting the tuple into two slices 
at that index, and finally creating a tuple that consists of the these two slices along 
with a slice consisting of the new field as Figure 13.321 illustrates. Similarly, record 
subtraction splits the tuple for the record immediately before and immediately after 
the label being subtracted into two slices and creates a tuple from these slices. 

Type abstractions are translated into functions by creating an argument xj for 
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fa 




fb 




fc 







fa 


fb 


fc 





Figure 3.32: Record extension. 



each label ij in the kind of the Note that abstractions of ordinary type vari- 
ables (aj's) are simply dropped. Lei-bindings for type abstraction (for the purpose of 
representing polymorphic recursion) are also straightforward. Type applications are 
transformed into function applications by generating "evidence" for each substituted 
row-type variable. As with type abstractions, substitutions into ordinary type vari- 
ables are dropped. Evidence generation requires computing the indices of each label 
ij G in any record type that extends {pj} by adding fields for every such f-. 

The situation is somewhat more complicated in the case of reify. As we have 
explained earlier, reify is special because its dynamic semantics are inherently type- 
sensitive and cannot be explained via type erasure. At runtime reify needs to know 
the indices of each label in its row type argument. But since all indices are allocated to 
an initial segment of the naturals, it suffices to know the length of the row. Therefore, 
our solution is to pass an additional "length index" argument for every row type 
variable that is bound by a type abstraction. 

To do so, we represent the length of a row by a "pseudo-label" $len in an index 
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context (S): 

E ::= ... I S,/?t^{(/i,ei),...,(/n,e„),($/en,e)} 
Then, we can define a helper function lengthOf to determine the length of a row: 

lengthOf(S,labels(r)) = indexOf (S, $/en, labels(f )) 

Assuming that $len is greater than any other label in the total order on labels, we 
can use indexOf to compute the length of a row. 



Properties of > 



A desirable property of the translation > is that it preserves the semantics of IL. Let 
Pi be a program in IL and P2 a program in LRec obtained by applying >. We wish 
to show that if Pi evaluates to n, then P2 also evaluates to n assuming that both 
languages use the sa me number va lues. The approach we will use is similar to Leroy's 



proofs by simulation (ILeroy 



20061 ). First, we construct a relation e ~ e. 



Definition 3.5.1 (e ~ e) 



A;!; S h e : f > e 
n ~ n e ~ e 

Then, we show that this relation is preserved during evaluation of Pi and P2- 
However, the number of evaluation steps may not equal to each other. In particular, 
the number of evaluation step of LRec is always larger than that of IL since the 



A; r,x : r; S h e : r > e Vi 



\x : T.e ~ Xx.e {k = Vi } -Li ~ ( v#{i) ) 



n 

i=l 
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A;r h a; : f 

(int) (var) 

A;r;Shn: int>n^ ' A; T; S h a; : f > x ' 

A; f , X : f'; E h e : f > e 

(fun) 



A; T; E h Ax : f'.e : f' ^ f [> Ax.e 

A; f ; E h ei : f2 — > f > ex A; f ; S h 62 : t2 > 62 
A; f ; E h ei 62 : t l> ei 62 

A; f ; E h 61 : f > 61 A; f , x : f ; E h 62 : f2 l> £2 
A; F; E h let x : f = ei in 62 : T2 > let x = in 62 



(app) 

(let) 



A; r, / : r2 ^ n, a; : r2; E h 61 : ri > ei 

A; r, / : f2 n, a: : r2; E h 62 : r > 62 

^ — — (letrec) 

A; F; E h letrec j -.t^^ t\ = \x : T2-e\ in 62 : r > letrec / = Ax.ei in 62 

A, q; : F, / : Vq; : K.f\\ E, a : {{li,xi), . . . , {In, Xn), {$len, x)} l~ ei : fi > ei 

A,q; : «;;f,/ : Va : «;.fi;E h 62 : fi [> 62 k ^ {h, . . . ,ln} ^ , 

(ty / letrec) 

A; F; E h letrec f -.Ma : k.ti = Aa : «;.6i in 62 : r 

> letrec / = Axi . . . Xxn-ei in 62 

A, a : k; F; E, q; : {{li, xi), . . . , {In, Xn), {%len, a;)} h e : f > e 

— {^1) ■ ■ ■ ) ^n} , , , 

(ty/abs) 

A; F; E h Ace : K.e : Vo; : k.t > Axi . . . Ax^.Aa; e 

A;f;Ehe : Va : K.f >e A h f ' : k (L, p) = labels(f') 

n = . . . , In} G {1, . . . , n}.e^ = indexOf (E, l^, {L U k, p)) 

e' = lengthOf(E, labels(f')) 

— Tty /app) 

A;f;She[r] :f[f7a] >eei...e„e' ^ ' ' 

Figure 3.33: The translation from IL into LRec for basic computations. 
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A; T; E h e : {Z : f, p} > e e' = indexOf(E, I, labels({Z : f, p})) 

7 (select) 

A; T; E h e J : f > e.e' ^ ^ 

{1^(1), • • • , = {/i, Vi. (A; T; E h : fj > e^) 



A;f;E h {h = ei}^^^:{k:f,}^^ 



(R) 



n 



> let = in . . . let = in ^ / 



A; T; E h ei : {p} > 
A; T; E h (=2 : ^2 > £2 cq = indexOf(E, /, labels({p})) 
A;r;Ehei (g) {/ = 62} : {/ : f2, p} > let a; = ei in ((x, 0, eg), ^2, (a^, eg, len(a;))) 

A; f ; E h e : {Z : f , p} > e = indexOf (E, I, labels({Z : f, p})) 



A;r;EI-e Z : {p} > let x = e in ((x, 0, cq), (x, cq + 1, len(x))) 
A; f ; S h e : ^ p I) -> f > e e' = lengthOf (E, labels({p})) 



A;r;E h reify [p][f] e : {p f} 

> letrec / = Axg-Axg/.An.Aw. 

ifzero {x^r, 

f Xe (Xg/ - 1) (n + 1) (t", {Xxn-e {Xc.c.n Xn)))) 

in/ee'l() 



(r/sub) 

(T-reify) 



Figure 3.34: The translation from IL into LRec for computations involving records. 
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translation may introduce more transitions in LRec. For example, the index passing 
mechanism adds more computations (ty/abs and ty/app) and translating from 
records to slices adds additional let expressions (r). Therefore, we use e i— e' 
instead of e i— > e'. 

Before we proceed to establishing the main theorem, we set up a few helper lem- 
mas: 

Lemma 3.5.2 (Substitution) 



A;r,a; : T ;E h e : f > e A;T;E\- v : f' > v 



e[v/x] ~ e[v/x] 



Proof: By induction on [>. ■ 



Lemma 3.5.3 (Type substitution) 



A, a : T; E, a : {{li, xi), . . . , {In, h e : r t> e 
Al-f'-.K (L, p) = labels(f') {li,...,ln} 
Viei..n-ei 1-^ Vj where = indexOf (S, l^, LU K,p) 

e[r/a] ~ e[vi/xi, . . .,Vn/xn] 
Proof: By induction on >. ■ 
Lemma 3.5.4 

Ifer^e and e i— > e' , then 3 e' such that e i— e' and e' ~ e'. 
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Proof: By induction of a deriation of e ~ e (i.e., A; F; S h e > e). At each step 
of induction, we assume that the desired property holds for all subderivations and 
proceed by case on the possible shape of e: 

• Case INT, VAR, FUN: Already values. Not applicable. 

• Case APP: A; f; S h ei 62 : > C]^ £2. There are three subcases on whether ei 
and 62 are values or not: 

— Subcase : Neither. Then, by given, ei 62 ^ e'^ 62. By 1-^ of IL, we know 
that 61 ^— s> 6^ (©). By inv of APP, we also know that ei ~ ((2)). By IH 
with and (2), there exists e'^ such that 1— > e'^ and e'^ ~ e'^. By APP, 
therefore, there exists e'^^ £2 such that e'j^ 62 ~ e'j^ £2 and £2 ^ e'^ 62- 

— Subcase : Only ei is a value. Similar. 

— Subcase : Both are values. Then, by given, (Ax : f'.e'^) V2 ^ ["^2/2^] • 
By inv of APP and FUN, we know that Xx : f'.e^ ~ Ax.e'^ and furthermore, 
A; r, X : f'; S h e'^ : t > e'^ (®). At the same time, V2 ~ £2. There are 
two cases on whether 62 is a value or not. If 62 is not a value, then it 
should have a form of a let expression which eventually becomes a value 
(i.e., slices) in a few steps. Therefore, we can safely assume that §2 is a 
value {v2)- Then (Ax.e'j^) t;2 1-* e'j^ [1^2/^] ^'^d ^iso by Lemma [3.5.21 with (T) 
and ^2, e[ [v2/x] ~ e'^ [1^2/3;] • 

• Case LET: A; f ; S h let x : f = ei in 62 : ^2 > let x = in 62- There are two 
subcases on whether ei and 62 are values or not. Then, similar to the case APP. 

• Case LETREC: A; f ; S h letrec / : f2 — > ri = Ax : f2.ei in 62 : r > letrec / = 
Ax.e^ in 62- of letrec, we have ei ~ and 62 ~ 62 under f , / : 



78 

f2 \- f,x : f2. Then, by Lemma 13.5.21 we can easily show that 62 [v/f] ~ 
62 [v/f] where v = Xx.{ei [letrec / = Xx.ei in /]) and v = Xx.{ei [letrec / = 
Xx.ei in /]) and -u ~ 

Case ty/letrec: Similar to the case letrec. 
Case ty/abs: Not apphcable. 

Case ty/app: A; F; S h e[f'] : f[f'/a] > e . . . e^. There are two subcases on 
whether e is a value or not: 

— Subcase : e is not a value. Then, by given, we have e [f'] e' [f'] which 
implies e 1-^ e' ((D). Then, by IH with and e ~ e, there exists e' 
which satisfies e 1— > e' and e' ~ e'. Therefore, by 1-^ of LRec, e . . . 

e' . . . and e' [f '] ~ e' . . . e„. 

— Subcase : e is a value. Then, by Lemma [3.4.31 (the canonical lemma), it is 
Aa : K.e'. Then, by ty/abs, Aa : K.e' ~ Axi . . . Xxn- e'. By inv of ty/abs 
and Lemma [3. 5. 3[ we can see that e'[f' /a] ^ e'[vi/xi, . . . ,v^/xn]- 

Case SELECT: A; F; S h e./ : f > e.e^. There are two subcases on whether e is 
a value or not: 

— Subcase : e is not a value. By given, we have e 1— >■ e'. We can easily get 
e' .1 ~ e'.e;. 

— Subcase : e is a value. Then, by Lemma[3A3]and select, {■ ■ ■ ,li = vi, . . .}./ 
e.ei where e = let xi = t;]^ in . . . let = t;^ in ) -j^ ^^"^ ~ 
indexOf (S, I, labels({/ : f , p})). By select, {...,/; = -u^, . . .}./ t-^ vi- Sim- 
ilarly, e.e_i (-^^ ) ■ i'-^ ^ -#(0' '"^'^ easily show the exsitence 
of ^#(/) such that vi ~ ^!^(/) and e.e/ 1-^^ -#(0' 
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Case R: A; F; S h { = Cj : {l^ : > let xi = in . . . let x,! = e„ in 

\ *^#(*) / ■ I ^'^^ 6i ~ ej for 1 < i < n. By given, i— and by IH, 

there exists which makes the remains straightforward. 

Case r/ext: A; F; S h ei ® {/ = 62} : : ^2, p} > let x = in ((x, 0, eg), 62, (x, eg, n)). 
There are two subcases. If either ei or 62 is not a value, then a proof is straight- 
forward. If both are values, we assume that {li = vi, . . . ,1^ = Vn} ^ {I = v} ^ 
{h = ^1, ■ ■ ■ Jn = Vn, I = v}. Similarly, let x = e^^ in ((x, 0, eg), £2, (x, eg, n)) 
( ). where denotes slice sorting. Then, by Definition 13.5.11 and 

by IH, {li=v\,...,ln = Vn,l = v)r^l^ ^ . 

Case r/sub: Similar to the case r/ext. 

Case T-REIFY: A; f ; S h reify [p][f] e : {p ^ f } > letrec / = . . . in / e e' 1 {}. 
If e is not a value, a proof is straightforward. If it is a value, by reify, 
reify [. ..,/„: r^, .][r] | = Ax^ : r^.t; (Aa : -k.Xc : { : r^- ^ « }J=i-cii x^) 
By 1-^ of LRec, letrec / = Axg.AXg/.An.At^.ifzero (Xg/, v_, f Xg — (n + l) 
{v, {Xxn-Xe {Xc.c.n Xn)))) in f vn 1 {) ( Xxi.v {Xc.c.i Xi) By 
the fact of ~ I /j = Axj : f^.v {Aa : -k.Xc '■ {ij '■ fj a Yj^i-cli x^) | ^ ~ 
( Xxi-v {Xc.c.i Xi) 



Theorem 3.5.5 

Let Pi be an IL program of type int and P2 a LRec program obtained by applying \>. 
Then, whenever Pi evaluates to n, P2 evaluates to n. 

Proof: 0; 0; h e : int > e and e h^* n immediately imply that e t-^* n by Defini- 
tion [33IT] and Lemma [3331 ■ 



n 

i=l 
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3.6 Implementation 

We have implemented a prototype compiler for the MLPolyR language in Standard 
ML. It retains all of the features that we have disscussed, including row polymor- 
phism for records and sums, polymorphic sums, extensible first-class CcLSGS cLS well 
as type-safe exception handlers. The compiler produces machine code for the Pow- 
erPC architecture tha t can run on Appl e Macintosh computers. It also supports x86 



backend based on C— (j Jones et al. 



19991). 



3.6.1 Compiler Phases 

The compiler is structured in a fairly traditional way and consists of the following 
phases: 

• lexer lexical analysis, tokenization 

• parser LALR(l) parser, generating abstract syntax trees (AST) 

• elaborator perform type reconstruction and generation of annotated abstract 
syntax (Absyn) 

• translate generate index-passing LRec code 



anf-convert convert LRec code into A- normal form (IFlanagan et al.lll993l ) 



anf-optimize perform various optimization including flattening, uncurrying, 
constant folding, simple constant- and value propagation, elimination of useless 
bindings, short-circuit selection from known tuples, inline tiny functions, some 
arithmetic expression simplification 

closure convert to first-order code by closure conversion 
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1 (* val main : string * string list — > OS . Process . status *) 

2 fun main (self , args ) = 



3 let val file = Command, parse args 

4 val ast = Parse, parse file 

5 val absyn = Elaborate . elaborate ast 

6 val lambda = Translate . translate absyn 

7 val anf = LambdaToANF . c o n v e r t lambda 

8 val anf_op = Optimize . optimize anf 

9 val closed = Closure . convert anf 

10 val {entrylabel , clusters} = Clusters . clust erify closed 

11 val clusters_cse = ValueNumbering . cse clusters 

12 val bbt_clusters = Treeify . t reeify clusters_cse 

13 val traces = TraceSchedule . schedule bbt_clusters 

14 val _ = CodeGen . codegen (traces , entrylabel , file) 

15 in OS . Process . success 

16 end 



Figure 3.35: A main driver for the MLPolyR compiler. 

• clusters separate closure-converted blocks into clusters of blocks; each cluster 
roughly corresponds to a single C function but may have multiple entry points 

• value-numbering perform simple common subexpression (CSE) within basic 
blocks 

• treeify re-grow larger expression trees to make tree-tiling instruction selection 
more useful 

• traceschedule arrange basic blocks to minimize unconditional jumps 

• eg perform instruction selection by tree-tiling (maximum-munch algorithm), 
graph-coloring register allocation; emit assembly code 

Each phase is implemented in a separate module and a main driver calls them 
sequentially as illustrated in Figure 13.351 
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1 val String 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 



{ cmdline_ar gs 
cmdline_pgm 
compare 
concat 
fromint 
inputLine 
output 
size 
sub 

substring 
toint 



string list , 
string , 

string * string — > int , 

string list — > string , 

int — > string , 

( ) — > string , 

string — > ( ) , 

string — > int , 

string * int — > int , 

string * int * int — > string 

string — > int } 



Figure 3.36: MLPolyR supports minimal built-in functions which perform simple 
1/0 tasks and string manipulations. 

3.6.2 Runtime system 
The runti me system, written in C, implements a simple two-space copying garbage 



collector ([Pierce 



20021 ) and provides basic facilities for input and output. 
For the tracing garbage collector to be able to reliably distinguish between pointers 
and integers, we employ the usual tagging trick. Integers are 31-bit 2 's- complement 
numbers. An integer value i is represented internally as a 2's-complement 32-bit 
quantity of value 2i. This makes all integers even, with their least significant bits 
cleared. Heap pointers, on the other hand, are represented as odd 32-bit values. In 
effect, instead of pointing to the beginning of a word-aligned heap object, they point 
to the object's second byte. Generated load- and store- instruct ions account for this 
skew by using an accordingly adjusted displacement value. With this representation 
trick, the most common arithmetic operations (addition and subtraction) can be 
implemented as single instructions as usual; they do not need to manipulate tag bits. 
The same is true for most loads and stores. 

MLPolyR also supports minimal built-in functions as a record value bound to 
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the global variable String as shown in Figure I3.36[ This record is allocated using 
C code and does not reside within the MLPolyR heap. It contains routines for 
manipulating string values, for converting from and to strings, and for performing 
simple I/O operations. Each routine can be accessed by dot notation. For example. 
String. compare could be used to compare two string values. Their implementations 
are hidden inside the MLPolyR runtime system. 



CHAPTER 4 
LARGE-SCALE EXTENSIBLE PROGRAMMING 



Today most programming languages support programming at the large scale by break- 
ing programs into pieces and developing these pieces separately. For example, the 
Standard ML module language provides mechanisms for structuring programs into 
separate units called structures. Each structure has its own namespace and they 
are hierarchically composable so that one structure can contain other structures. 
The Standard ML module system also supports module-level parameterization which 
makes code reuse easy. 

In this section, we propose the module system for MLPolyR in order to provide 
an ML-like module system which provides separate compilation and independent 
extension in presence of polymorphic records, first-class cases and type safe exception 
handlers. After presenting the module language, we will discuss a way to implement it 
by translating module language terms into ordinary MLPolyR core language terms 
and we will also disuss how to support separate compila tion. Then, we will revisit 



the elaborated expression problem by Zenger and Odersky (IZenger and Odersky 
with our module-level solution. 



20051 1 



4.1 The module system 

The syntax of our proposed module language is presented in Figure 14.11 We use X 
and T as met a- variables for module names and template names, respectively. The 
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Terms Cm 

Modules M 

Components C 

Declarations D 

Program P 



— e I X.x 

= {{Ci...Cn}} I Mffl{{C}} \Mnx\T (Mi,...,M„) 

= val X = Crn 

= module X = M \ template T {Xi,...,Xn) = M 



^ Di...Dn 

Figure 4.1: The syntax for the module language. 

core language (e) is extended to support the dot notation (X.x) for accessing a com- 
ponent (named x) in a module {X). A module itself consists of a sequence of value 
components ({{Ci . . .C^}}). A value component is defined as a value declaration 
(val X = em)- A component in the module also can be added (M ffl {{C*}}) or 
removed (M B x). A module can also be optained by applying a template to mod- 
ules (T (Ml, . . . , Mn)). A program is a sequence of declarations which can be either 
definitions of modules or those of templates. A template can take other modules as 
arguments. 

We treat modules as packages that contain only value components, so module 
language does not have type components unlike the SML module language. For 
example, we can define a module Queue which contains basic operations such as 
insert and delete: 

1 module Queue = {{ 

2 val empty = [] 

3 fun insert (q, x) = List. rev (x ::( List . rev q)) 

4 fun delete q = case q of 

5 [] => raise 'Empty () 

6 I h : : tl => (tl , h) 

7 }} 



Each component in the module can be accessed by the usual dot notation: e.g., 
Queue.empty or Queue.insert(q, 5). Then, we can add more operations by extending 
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the basic Queue into EQueue: 




Queue. insert (q, x)) 



where the clause with is a syntactic sugar for M ffl {{D}}. 

We may consider a priority queue which retrieves the element with the highest 
priority. In our implementation, we only have to modify the function insert in a way 
that a sorted list is built on an entry time: 

1 module IntPriority Queue = Queue where {{ 

2 fun insert (q, x) = case q of 



where the clause where is a syntactic sugar for (M B /) ffl {{D}}, similar to a record 
update operator. However, this priority queue works only over integers. Alternatively, 
we may keep queues in an alphabetic order, and then the code should be changed as 
follows: 

1 module StrPriorityQueue = Queue where {{ 

2 fun insert (q, x) = case q of 



3 
4 

5 




6 }} 



3 
4 
5 
6 




7 }} 



We can make code more reusable by generalizing this code so that it can work over 
any types. Similar to functors in the Standard ML module system, we provide a 
parameterized mechanism called a template which takes other modules as arguments. 
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For example, we can parameterize a comparison function, so that a priority queue 
can work over any type depending on its argument: 

1 template PriorityQueue (Order) = Queue where {{ 



2 fun insert (q, x) = case q of 

3 [] => [x] 

4 I h::tl => if ( Order . It (x , h) ) then x::q 

5 else h::(insert (tl, x)) 



6 }} 

Unlike functors, we do not pose any type constraints except that the module Order 
should have a component named It. By applying this template to any modules that 
have the component It, a new priority queue can be instantiated: 

1 module IntPrioiry Queue = Priotity Queue (IntOrder) 

2 module StrPrioiryQueue = Priotity Queue (StrOrder) 

where IntOrder and StrOrder can be implemented as follows: 

1 module IntOrder = {{ 

2 fun It (x, y)=x>y 

3 }} 
4 

5 module StrOrder = {{ 

6 fun It (x, y) = String . compare (x, y) > 

7 }} 



4.2 An implementation of the module language 

Our main idea of implementing the module language is to translate the module lan- 
guage constructs into ordinary MLPolyR core language ones. In particular, we can 
take advantage of the fact that each operator on module expressions has a corre- 
sponding record operator as illustrated in Table 14.11 
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record e 


module M 


Introduction 


{/l = (1- ■ ■ ■ ■ //( = ( n} 


{{val /i = (1. .... val /„ = c;,}} 


Selection 


r.l 


M.l 


Extension 


r (g) {I ~ e} 


M ffl {{vaU = e}} 


Substraction 


r / 


M B / 



Table 4.1: Symmetry between record and module operations. 



For example, the module Queue can be translated into a form of records: 



1 val Queue = 

2 let val empty = [] 

3 fun insert (q, x) = ... 

4 fun delete () = ... 

5 in { empty = empty , 

6 insert = insert , 

7 delete = delete 

8 } 

9 end 



where all components arc exposed as record fields. In case of the module EQueue, we 
need polymorphic and extensible records which EL provides: 

1 val EQueue = 

2 let fun size q = ... 

3 fun insertLog (q, x) = ... 

4 in { size = size , 

5 insertLog = insertLog , 

6 . . . = Queue 

7 } 

8 end 



Similarly, we can translate the module IntPriorityQueue into the record with re- 
placement of a field insert: 

1 val IntPriorityQueue = 

2 let fun insert ' (q, x) = ... 

3 val {insert , ... = rest} = Queue 
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4 
5 
6 
7 



in {insert = insert ' 
... = rest 



} 



end 



A template becomes a function taking arguments and producing a module (i.e., a 
record). For example, the template PriorityQueue is translated as follows: 



1 val PriorityQueue = fn Order => 

2 let fun insert ' (q, x) = . . . if (Order. It (x, h)) then 

3 val {insert , ... = rest} = Queue 

4 in {insert = insert ' , 

5 . . . = rest 

6 } 

7 end 



In sum. Figure 14.21 shows the translation rules from module expressions (M) into 
EL expressions (e). 



4.3 Separate compilation 



Separate compilation has been considered as one of key factors for the development 



of extensible software (jZenger and Odersky 



20051 ). Without the support of separate 



compilation, any extensions to the base system may require re-typechecking or re- 
compilation of the existing ones. 

Suppose we have the following program fragment: 



1 module EQueue = Queue with {{ 

2 fun size q = List . length q 

3 fun insertLog (q, x) = (log "insert"; Queue, insert (q, x' 

4 }} 



It would be surprising if we had to compile the module Queue whenever we compile 
the module EQueue, but many extensibility mechanisms require such redos. For in- 
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M --^ e 



Ixi ■ ■ ■ Ixn fresh labels 



Viei..n-em,- e- 



{{valxi = emi, • • • , val = emn}} let val xi = e'^ 



(Module) 



val — 6^ 

in {}xi — 2^1) ■ ■ ■ ) ^a;n — ^n} 
end 



em e 



D ~^ e 



M e -w e' /r fresh label 



M ffl {{val X = em}} let val x = e 

in e (g) {/a; = x} 
end 



— (Extension) 



M e Ix fresh label for M.x 

M H X e(Z)lr. 



(Subtraction) 



r(Mi,...,M„) T(ei,...,en) 



(Application) 



Ix fresh label 

^ X.L 



(Path) 



e ~^ e 



(Non/path) 



M e 



module X = M val X = e 



(Module-declaration) 



M -w e 



template X (Xi, . . . , Xn) = M 
val X = fn (Xi, . . . , Xn) => e 



(Template declaration) 



Figure 4.2: The translation from the module language into the core language. 
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stanc e, in AspectJ, aspects can clearly modularize all extensions in separate aspect 



code (lAspectJ 



20081 ) ■ However, their composition does not provide separate compi- 
lation, so it is necessary for base code to be either re-typechecked or re-compiled (or 
both) for every composition. If we can compile EQueue without compiling the module 
Queue, we would say that they can be compiled separately. 



Generally, separate compilation can be implemented in two ways (lElsman 



20081). 



Suppose we want to compile a program fragment P which depends on a module M: 

• Incremental compilation does not require explicit type information on M, but 
requires M to be compiled prior to P. 

• (True) separate compilation requires explicit type information on M, but does 
not require the prior compilation of M. 

Because all types are fully inferred, the core language does not require type anno- 
tations. Taking the incremental compilation approach, we may omit type annotation 
even for modules. Some may argue that it would be desirable to explicitly write the 
intended type, especially for the sake of consistency and documentation purposes. 
However, it does not seem practical for a user to spell out all types in MLPolyR 
where a type may contain row types and kind information. For example, suppose 
higher-order functions such as map: 

1 fun map f [ ] = [ ] 

2 I map f (x::xs) = f x :: map f xs 

Here, map does not raise exceptions but its arguments might. With this in mind, 
map's type should be as follows (using Haskell-style notation for lists types [r]): 

val map : Va : : .^.V7 : 0.V5 : 0.{a ^ f3) ^ {[a] ^ [/3]) 
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In order to avoid the need for this prohibitively excessive programmer annotations, 
the approach we use is to allow the type checker to infer module signatures and to 
record them, so that we can use this information later when we typecheck or compile 
a program which depends on this type information. Therefore, our compiler now 
produces intermediate information including typing (e.g., foo.t) and machine code 
(e.g., foo.l written in LRec) as the following sequences: 

foo.t foo.l 

foo.mlpr = EL ^ IL " LRec " Value 

Type checking Compilation Evaluation 

Then, this information will be used during type checking and evaluating bar.mlpr 
which depends on the module defined in foo.mlpr: 



har.t foo.t bar.l foo.l 

it t ^ 

bar.mlpr = EL *- IL ^ LRec ^ Value 

Type checking Compilation Evaluation 



This setup is virtually straightforward, with a few notable exceptions: 



Even though our module language does not have type components, our type 
inference creates unification variables and some of them may escape without 
generalization. Here, the subtlety lies in whether the type checker allows them 
to escape to the module level. Dreyer and Blume explore this subtlety and 
note that many differ ent policies exist regardin g how to handle non-generalized 



unification variables (jPreyer and Blume 



20061 ). According to their work, the 



SML/NJ compiler disallows unification variables to escape. Even though it has 
the benefit of being consistent and predictable, it can be too restrictive in some 
cases. Suppose we have the following code in SML: 
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1 structure A = 

2 struct 

3 val idO = fn x => x 

4 val id = idO idO 

5 end 

6 

7 val _= A. id " hello ' ' 



While the SML/NJ compiler rejects this code but the MLton compiler accepts 
it in a more liberal way but it still requires access to the whole program. Since 
we do not have type components, we can take such a liberal way relatively 
easily. We allow non-generalized unification variables to escape up to the module 
level in a similar way to MLton, but we can also manage to support separate 
compilation. Let us see such examples: 

1 module IDO = {{ 

2 val idO = fn x => x 

3 val id = idO idO 

4 }} 



where idO has a polymorphic type of \/a.a a but id has a monomorphic type 
of /3 — > /3. Note that P is not a polymorphic variable because idO id O is not a syn - 



2OO2I). 



tactic value and the value restriction forces it to be monomorphic (iPierce 
Therefore, the following code will not pass the type checker since monomorphic 
type variable (3 can not be instantiated into both int and string at the same 
time: 

1 val _= (IDO. id 5, IDO. id "hello'') (* ill-typed*) 



However, the situation can change when separate compilation is considered. 
Suppose we have modules A, B and C as follows: 
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1 module A = {{ 

2 val a = IDO. id 5 

3 }} 

4 

5 module B = {{ 

6 val b = IDO. id ' ' hello ' ' 

7 }} 

8 

9 module C = {{ 

10 val _ = (A. a, B.b) (* ill-typed *) 

11 }} 



Even MLton would reject A and B when they are compiled together. As long as 
we separately compile A and B, on the contrary, there is no reason to disallow 
them to pass the type checker. They can be used independently. However, 
they can not be linked together because it implies that an unification variable 
is instantiated inconsistently across the module boundary. Therefore, the type 
checker should disallow module C even after A and B are separately compiled. 
In order to detect this inconsistency across the module boundary, we may need 
to track all instances of unification variables and check their consistency during 
linking time. So far, our EL does not have any imperative features so we do 
not need such a checking mechanism during the link time. However, we will 
need one in case that we add mutable references since it is possible to assign 
two different types into one reference cell and the usual typing rule for the 
polymorphic let-binding may be unsound. 

• Higher-order modules cause another such complication. Consider the following 
code: 

1 template ID () = {{ 

2 val idO = fn x => x 

3 val id = idO idO 

4 }} 
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5 

6 module D = ID () 

7 module E = ID () 

8 

9 val _= (D.id 5, E.id ''hello'') (* value in question *) 



Since we translate a template into an abstraction, we generate new fresh type 
variables whenever we see unbounded unification variables along with templates. 
Under this scheme, the above value in question becomes accepted since D.id now 
has a type of a — > a and E.id has a type of /9 — > /3 (assuming that a and (3 are 
fresh type variables). Then, when they are applied to 5 and "hello" , respectively 
(Line 9), a and j3 will be instantiated to int and string, independently. However, 
it might be surprising to see the type checker rejecting the following code: 

1 val _ = (D.idO 5, D. idO "hello ' ') (* ill -typed *) 

2 val _ = (E.idO 5, E.idO "hello ' ') (* ill-typed *) 



We may expect to translate the template ID into a core term with a type of 
{idO : Va.o; a, id : /3 — >• /?}. Since our core language d oes not support 



rank-1 polymorphism as in SML^^ (jOhori and Yoshida 



19991 ) ■ the translated 



type will actually be Va.() {idO : a — a, id : /? — f3}. Therefore, after 
instantiation, a type of idO becomes a ^ a where a is not a polymorphic 
variable any more but just a placeholder for type instantiation. Thus, a can 
not be instantiated into both int and string. This limitation can be overcome 
by adopting rank-1 polymorphism in our core language or by improving our 
module language up to the level of the ML module language. 

In our core language, we have the nice property that well-typed programs do 
not have uncaught exceptions. Similarly, uncaught exceptions cannot escape up 
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to the module level without being caught. For example, the following example 
will be ill-typed: 

1 module Ex = {{ 
2 

3 val _ = raise 'Fail () (* ill— typed *) 
4 

5 }} 

However, the exception may be caught across the module boundary. Let us see 
the module List: 

1 module List = {{ 
2 

3 fun hd 1 = case 1 of 

4 [] => raise 'Empty () 

5 I h : : t 1 => h 
6 

7 }} 

Any exception would not be raised until when hd is applied, and the type of hd 

Empty:();/9 

captures this fact: Va.Vp : {Empty}, a list > a. Then, the exception 

Empty is required to be caught when an argument is supplied: 

1 val h= List. hd [1,2,3] (* ill-typed *) 

To guarantee exception safety, the proper handler must be prepared at a caller's 
site: 

1 val h = try x = List . hd [1 ,2 ,3] 

2 in X 

3 handling 'Empty () => 

4 end 
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4.4 Case study: the SAL interpreter example revisited 

In the previous chapter (Section I3.2p . we have implemented the base SAL interpreter 
and its extensions mainly by using extensible cases. In this section, we revisit the 
same example with the support of modules. 

Base interpreter 

We reorganize the previous implementation, making use of our module language. 
Figure His] shows the module version of a base interpreter for SAL. First, we structure 
programs into separate units. For example, the module Envt consists of a collection 
of functions for dealing with environments: bind and empty: 

1 module Envt = {{ 

2 fun bind (a, x, e) y = 

3 if String . compare (x, y) = then a else e y 

4 fun empty x = 

5 raise 'Fail ( String . concat ["unbound variable: ", x, "\n"]) 

6 }} 

The modules Checker, BigStep and Interp are organized in a similar manner. Notice 
that each module has its own namespace, so that we do not have to make up new 
names such as check_case or evaLcase (as in Section [3l2l) . 

Extensions 

As the language grows, the corresponding rules such as static semantics (check) and 
dynamic semantics (eval) are changed. Figure 14.41 shows modules for an extended 
checker EChecker and an extended evaluator EBigStep. Note that we can now use 
more uniform naming (i.e., check instead of echeck) due to the availability of separate 
namespaces. 
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1 (* module for the static semantics *) 

2 module Checker = {{ 



3 fun bases (check, env) = 

4 cases 'Vt^R x => env x 

5 I 'NUM n => 

6 I 'PLUS (el , e2) => 

7 ( check (env , el ) ; 

8 check ( env , c2 ) ) 

9 I 'LET (x, el , e2) => 

10 ( check (env , el ) ; 

11 check (Envt.bind ((), x, env), e2)) 
12 

13 fun check e = 

14 let fun run (env, e) = match e with bases (run, env) 

15 in (run ( Envt . empty , e); e) 

16 end 



17 }} 
18 

19 (* module for the evaluation semantics *) 

20 module BigStcp = {{ 



21 fun bases (eval, env) = 

22 cases 'VAR x => env x 

23 I 'NUM n => n 

24 I 'PLUS (el, e2) => eval (env, el) + eval (env, e2) 

25 I 'LET (x, el , e2) => 

26 eval (Envt.bind (eval (env, el), x, env), e2) 
27 

28 fun eval e = 

29 let fun run (env, e) = match e with bases (run, env) 

30 in run ( Envt . empty , e) 

31 end 



32 }} 
33 

34 (* module for the interpreter *) 

35 module Interp = {{ 



36 fun interp c = 

37 try r = BigStep . eval ( Checker . check e) 

38 in r 

39 handling 'Fail msg => ( String . output msg ; — 1) 

40 end 



41 }} 



Figure 4.3: The module version of a base interpreter. 



99 



1 (* module for the extended static semantics *) 

2 module EChecker = {{ 



3 fun bases (check, env) = 

4 cases ' IfO (el , e2 , e3) => 

5 (check (el, env); check (e2 , env); check (eS, env)) 

6 default: Checker . bases (check, env) 
7 

8 fun check e = 

9 let fun run (env, e) = match e with bases (run, env) 

10 in (run ( Envt . empty , e); e) 

11 end 



12 }} 

13 

14 (* module for the extended evaluation semantics *) 

15 module EBigStep = {{ 



16 fun bases (eval, env) = 

17 cases 'IFO (el , e2 , e3) => 

18 if eval (env, el) = then eval (env, e2) 

19 else eval (env, e3) 

20 default: BigStep . bases (eval, env) 
21 

22 fun eval e = 

23 let fun run (env, e) = match e with bases (run, env) 

24 in run ( Envt . empty , e) 

25 end 



26 }} 
27 

28 (* module for the extended interpreter *) 

29 module EInterp = {{ 



30 fun interp e = 

31 try r = EBigStep . eval ( EChecker . check e) 

32 in r 

33 handling 'Fail msg => ( String . output msg ; — 1) 

34 end 



35 }} 

Figure 4.4: Implementation for an extended interpreter. 



Independent extensions 
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Moreover, we can utilize templates, i.e., "module functions" which take concrete 
modules as arguments. The result is a composite module: 

1 template InterpFun (C, E) = {{ 



2 fun interp e = 

3 try r = E. eval (C. check e) 

4 in r 

5 handling 'Fail msg => ( String . output msg ; — 1) 

6 end 



7 }} 



Then, we can instantiate different interpreters depending on their parameters: 

1 module I = InterpFun (Check, BigStep) 

2 module I' = InterpFun (ECheck, EBigStep) 



In this way, it becomes possible to combine independently developed extensions 
(e.g., ECheck and EBigStep) so that they can be used jointly. 



CHAPTER 5 

BEYOND THE VERY LARGE: FEATURE-ORIENTED 

PROGRAMMING 



5.1 Introduction 



Previous work on extensible compilers has proposed new techniques on how to easily 
add extensions to existing programming languages and their compilers. F or example, 



JaCo is an e xtensible compiler for Java based on extensible algebraic types (jZenger and Odersky 



2001 



20051 ). The Polyglot framework implements an extensible compiler where even 



changes of c ompilation phases an d manipulation of internal abstract syntax trees 



are possible (iNystrom et al. 



20031 ). Aspect-oriented concept s (i.e., cross-cutt ing con 



20051 ). While 



cerns) are also applied to extensible compiler construction (IWu et al. 
all this work successfully demonstrates that a base compiler can be extended easily, 
most of these existing solutions do not attempt to pay special attention to the set of 
extensions they produce. Sometimes all the extensions can be integrated together to 
become a new version of the system, in which case these existing solutions work well. 

However, there are many cases where software changes cannot be merged back 
so that different versions evolve and begin to coexist independently. Moreover, there 
are even situations where such divergence is planned from the beginning. A mar- 
keting plan may introduce a product lineup with multiple editions. Aa mentioned 



in Chapter [H Windows Vista which ships in six editions is such an example. Unless 
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we carefully manage each change in different editions, multiple versions that origi- 
nate from one source start to coexist separately. They quickly become so incompatible 
that they require separate maintenance, even though much of their code is duplicated. 
This quickly leads to a maintenance nightmare. In such a case, the role of program- 
ming languages becomes limited and, instead, we need a way to manage variability 
in the product lineup. 

One possible way of addressing these issues is to adopt the product line engineering 



(Kang et al. 


2002; 


Lee et al. 


2002; 


SEI 


2008) 



20081 ) . It defines a software product 



line to be a set of software systems that share a common set of features with variations. 
Therefore, it is expected to be developed from a common set of software components 
(called core assets) on the same software architecture. The paradigm encourages 
developers to focus on developing a set of products, rather than on developing one 
particular product. Products are built from core assets rather than from scratch, so 
mechanisms for managing variability are essential. 

In many cases, however, product line methods do not impose any specific synthe- 
sis mechanisms on product line implementation, so implementation details are left 
to developers. As a consequence, feature-oriented programming (FOP) emerges as 
an attempt to realize this paradigm at the code level. For example, AHEAD, Fea- 



tureC-|--|- and F 



Apel et al 



2005 



J sup port the composition of features in various ways (IBatory 



20081 ) 



2004 



Although FOP has become popular in product line engineering, comparative stud- 
ies of the corresponding mechanisms for product line implementation have rarely been 
conducted. Lopez-H erreion et al. compared five technologies in order to evaluate fea- 



ture modularization (iLopez-Herrejon et al. 



20051 ) but their experiment was conducted 
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entirely at the code level, which lead them to conclude that a technology-independent 
model would be needed in order to reason about product lines. 

In this section, we first propose a two-way extensible interpreter as a canonical 
example for product line engineering. Our intention with this example is to provide 
a framework for comparison of language support for product line implementation. 
Then, we identify some issues that an implementation technique is expected to resolve, 
illustrate how the MLPolyR language can be used to implement a two-way extensible 
interpreter, and evaluate how effective our solution is. 



5.2 A two-way extensible interpreter as a generator 



We have seen how the MLPolyR language implements a two-way extensible interpreter 
in various ways. Similarly, many programming language solutions have already been 
developed to solve the dilemma caused by simultaneous two-way extensibility. For 
example, Zenger and Odersky present s a hybrid language specifically designed to solve 



this issue ( IZenger and Odersky 



20051 ) 



Most of these existing solutions, however, do not consider the set of extensions 
they produce. For example, assume one wants to build an interpreter I, which is the 
composition of the combinators eval (realizing the evaluation semantics) and check 
(realizing the static semantics) where o means function composition: 



I = eval o check 



The evaluation stage could also be implemented by the machine semantics eval 
instead of the evaluation semantics eval: 



Im = evalm o check 
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Optionally, the combinator opt which performs constant folding may be inserted to 
build an optimized interpreter lopt: 



lopt = eval o opt check 

As the base language grows to support a conditional term, aval, opt and check also 
evolve to constitute a new interpreter l^p^: 

lop^ = eval' o opt' o check' 

Since these interpreters have a lot in common, we should try to understand them as 
a family of interpreters. Therefore, the two-way extensible interpreter turns out to 
be a generator of a program family of SAL interpreters. While this two dimensional 
extension problem has been generally studied within the context of how to easily 
extend base code in a type safe manner, we focus on the generativity aspect of such 
solutions. Moreover, our extensible interpreter example enables us to emphasize the 



overall structure of the system, the so-called software architecture ( iGarlan and Shaw 



19941 ). Hence, we can analyze variations in terms of architectural and component- level 
variations, rather than in terms of operations or data which are rather vague and gen- 
eral. Architectural variation captures inclusion or exclusion of certain functionality. 
For example, the extended interpreter includes an optimization phase while the base 
interpreter does not. Component-level variations capture that which may have multi- 
ple alternative implementations. For example, every interpreter has its own evaluator 
which implements either the evaluation semantics or the machine semantics. 
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5.3 Feature-oriented product line engineering 

Since we set up a two-way extensible interpreter to generate a family of products, it 
is natural to apply product line engineering for better support of their development. 
Among various product line approaches, we adopt FORM product line engineering 
for the following reasons: 



Dased mode 



The method relies on a feature- 
reasoning about product lines (iKang et al. 



which provides adequate means for 



20021 ) 



The method supports architecture design which plays an important role in 
bridging the gap between the concepts at the requirement level and their real- 
ization at the code level by d eciding how variation s are modularized by means 



of architectural components (iNoda and Kishi 



20081). 



• The method consists of well-defined development process which enables us to 
easily identify implementation dependent phases. 

To let us focus on product line implementation as opposed to implementation 
independent processes, we highlight the former as shown in Figure 15.11 The area 
surrounded by dashed lines is the subject of our comparative study. In this section, 
we will give an overview of overall engineering activities for a family of the SAL inter- 
preters. Then, in the following section, we will show how to refine conceptual models 
into concrete models with the mechanisms that the MLPolyR language provides. 



5.3.1 Product line analysis 

We perform commonality and variability analysis for the family of the SAL inter- 
preters. We can easily consider features in the base interpreter as commonalities and 
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Figure 5.1: Development process (adopted from FORM (IKang et al.ll2002l )). 



exclusive features only in some extensions as variations. Then, we determine what 
causes these variations. For example, we can clearly tell that the choice of a set of 
language constructors differentiates interpreters. Similarly, the choice of evaluation 
strategies makes an impact. Optimization could optionall y be performed. We re - 



fer to these factors that differentiate products as features ( 



Kang et al. 



2002 



19981). 



Figure 15.2! shows the feature model according to our product line analysis. 



5.3.2 Product line architecture design 

Architecture design involves identifying conceptual components and specifying their 
configuration. Based on the product line analysis, we define two reference architec- 
tures by mapping each combinator to a distinct component in Figure 15.31 A com- 
ponent can be either generic or static. A generic component encapsulates variations 
when a certain aspect of this component varies in different products. The evaluator 
component is a typical example. A static component performs usual common func- 
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Figure 5.3: Reference architectures. 

tionahty across family members. 

During this phase, we have to not only identify components but also define inter- 
faces between components: 

checker : term — » term 
optimizer : term term 
evaluator : term value 



As usual, the arrow symbol — > is used to specify a function type. In our example, 
components act like pipes in a pipe-and-filter architecture style, so all interface infor- 
mation is captured by the type. By using the above components, we can specify the 



108 



overall structure of various interpreters: 

interp = evaluator o checker 

interpOpt = evaluator o optimizer o checker 

5.3.3 Product line component design 

Next, we identify conceptual components which are constituents of a conceptual archi- 
tecture. A conceptual component can have multiple implementations. For example, 
there are many versions of the evaluator component depending on the evaluation 
strategy: 

eval : term — > value 
evalm : term — > value 

At the same time, the language term can be extended to become term' which is an 
extension of term (for example to support conditionals): 

eval' : term' value 
evai'm : term' — > value 

Similarly, check and check' can be specified as follows: 

check : term term 
check' : term' — > term' 

For the optimizer component, there are many possible variations due to inclusion or 
exclusion of various individual optimization steps (here: constant folding and short- 
circuiting) and due to the variations in the underlying term language (here: basic 
and extended): 
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5.3.4 Product analysis 

Product engineering starts with analyzing the requirements provided by the user and 
finds a corresponding set of required features from the feature model. Assuming we 
are to build four kinds of interpreters, we have to have four different feature selections: 

FS(I) = {Evaluation semantics} 

FS(lm) = {iVIachine semantics} 

opt) = {IVIachine semantics, Optimizer, Constant folding} 

FS(Iqp^) = {Conditional, Evaluation semantics. Optimizer, Constant folding. Short — circuit} 

Here, the function FS maps a feature product to its corresponding set of its required 
features. (For brevity only non- mandatory features are shown.) 

During product engineering, these selected feature sets give advice on the selec- 
tion among both reference architectures and components. Figure [5^ shows the overall 
product engineering process where the reference architecture interpOpt gets selected, 
guided by the presence of the Optimizer feature. Feature sets also show which com- 
ponents need to be selected and how they would be instantiated at the component 
level. For example, the presence of the Constant folding feature guides us to choose 
the component optimizer with the implementation optcons- Similarly, the presence of 
the Machine semantics feature picks the implementation evalm instead of eval. The 
target product would be instantiated by assembling such selections. 
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Figure 5.4: Product engineering. 
5.4 Issues in product line implementation 

During the product line asset development process, we obtain reference models which 
represent architectural and component-level variations. Such variations should be 
realized at the code level. The first step is to refine conceptual architectures into 
concrete architectures which describe how to configure conceptual components. Then, 
product line component design involves realization of conceptual components using 
the proper product feature delivery methods. This section discusses some issues that 
surface during product line implementation. 



Product line architecture implementation 

In order to specify concrete reference architectures, we have to not only identify con- 
ceptual components but also define interfaces between components. Moreover, since 
there may be multiple reference architectures, it would be convenient to have mech- 
anisms for abstracting architectural variations, capturing the inclusion or exclusion 
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of certain components. Therefore, any adequate implementation technique should be 
able to provide mechanisms for: 

• Declaration of required conceptual components (checker, optimizer and evaluator) 
and their interfaces, 

• Specification of the base reference architecture interp and its optimized coun- 
terparts interpOpt by using such conceptual components. 

Product line component implementation. 

This phase involves realization of conceptual components. The main challenge of this 
phase is in how to implement generic components that encapsulate component-level 
variations. Such variations could be in the form of either code extension or code 
substitution. Any solution to the traditional expression problem can be a mechanism 
to implement code extension. For our running example, the following pairs correspond 
to code extension: 

• check and check' 

• eval and eval' 

• evalm and eval(„ 

• optcons and opt'cons 

• optcons and opt'^^^^^^^ort 

• °P<hort °Ptcons+short 

Code substitution provides another form of variation at the component level when 
two different implementations provide interchangeable functionality. For example. 
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eval and evalm both implement the evaluator component, but neither is an extension 
of the other. Language abstraction mechanisms are expected to handle this case 
elegantly. For our running example, the corresponding scenarios are as follows: 

• eval and evalm 

• eval' and eval^ 

Product engineering 

Based on the product analysis, a feature product is instantiated by assembhng product 
line core assets. For our running example, the evaluated techniques should be able to 
instantiate four interpreters (I, Im, lopt, 'opt) based on the selected feature set. 

5.5 Language supports for product line implementation 

In this section, we illustrate how the MLPolyR language can be used to implement 
a two-way extensible interpreter. First, we show how each issue identified in the 
previous chapter will be resolved by various mechanisms provided by MLPolyR. A 
comparison with other product line implementation techniques follows. 

Product line architecture implementation 

Each component in a reference architecture is mapped to an MLPolyR module. 
As specified in Section 15.41 we first define types (or signatures) of the interested 
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components based on the outcome of product line architecture design. (Section [5.3.2p : 



Checker 



{{ check : term term, . . . }} 



Optimizer 



{{ opt : term — > term, . • • }} 



Evaluator 



{{ eval 



term 



int 



}} 



where . . . indicates that there may be more parts in a component, but they are not 
our concerns. In practice, we do not have to write such interfaces exphcitly since 
the type checker infers the principal types. Then, by using these conceptual modules 
(Checker, Optimizer and Evaluator), we can define two reference architectures: 

1 module Interp = {{ 

2 val interp = fn e => Evaluator . eval ( Check . check e) 



Alternatively, like functors in SML, we can use a parameterization technique called 
a template which takes concrete modules as arguments and instantiates a composite 
module: 

1 template InterpFun (C, E) = {{ 



3 }} 



4 





4 

5 template InterpOptFun (C, O, E) = {{ 

6 val interp = fn e => E. eval (O. opt (C. check e)) 

7 }} 



where C, and E represent Checker, Optimizer and Evaluator respectively. Their 
signatures are captured as constraints by the type checker. For example, the type 
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checker infers the constraint that the module C should have a component named check 
which has a type of a ^ P and P should be either an argument type of the module 
E (Line 1) or that of (Line 5). 

The second approach with templates supports more code reuse because a reference 
architecture becomes polymorphic, i.e., parameterized not only over the values but 
also over the types of its components. As long as components satisfy constraints 
that the type checker computes, any components can be plugged into a reference 
architecture. For example, for the argument C, either the base module Check and its 
extension EChecker can applied to the template InterpFun. 

Product line component implementation 

Modules in MLPolyR implement components. In order to manage component- 
level variations, we have to deal with both code extension and code substitution 
as discussed in Section 15.41 For example, we will see multiple implementations of the 
component Evaluator: 



BigStep 


: {{ 


eval 


term — 


* int. 


•• }} 


Machine 


: {{ 


eval 


term — 


^ int. 


•• }} 


EBigStep ; 


: {{ 


eval 


term' - 


int, . 


•• }} 


EMachine 


: {{ 


eval 


term' - 


int, . 


■■ }} 



where term represents a type of the base constructors and term' that of the extension. 
BigStep and EBigStep implement the evaluation semantics and its extension while 
Machine and EMachine implement the machine semantics and its extension. Note 
that the pair of BigStep and EBigStep and also the pair of Machine and EMachine 
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correspond to code extension while the pair of BigStep and Machine corresponds to 
code substitution. 

Code extension is supported by first-class extensible cases as we already studied 
in Section [3^ Figure [531 shows how such extensions are made. In an extension, only 
a new case is handled (Line 19-21) and the default explicitly refers to the original set 
of other cases represented by BigStep. bases (Line 22). 

Code substitution as another form of variation at the component level does not 
cause any trouble. For example, Figure 15.61 shows the module Machine which im- 
plements the machine semantics (i.e., evalm). Like BigStep and EBigStep, EMachine 
extends Machine through two extensible cases (Line 27 and 31). In our example two 
different implementations (BigStep and Machine) provide interchangeable functional- 
ity, but neither is an extension of the other, so they are implemented independently. 

Analogously, we can implement the remaining two conceptual components Checker 
and Optimizer. For Checker we have. 

Check : {{ check : term term, • • • }} 
ECheck : {{ check : term' term', . . . }} 

where each implements the concrete component check and check', respectively. For 
the component Optimizer, 



COptimizer 


: {{ 


opt : 


: term - 


^ term. 


•• }} 


ECOptimizer 


: {{ 


opt : 


term' - 


term', . 


•• }} 


ESOptimizer 


: {{ 


opt : 


term' - 


term', . 


•• }} 


ECSOptimizer 


: {{ 


opt : 


term' - 


term', . 


•• }} 



where each implements the concrete component optcons, optconsi °P^short °P*c 
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1 (* module for the evaluation semantics *) 

2 module BigStep = {{ 



3 fun bases (eval, env) = 

4 cases 'VAR x => cnv x 

5 I 'NUM n => n 

6 I 'PLUS (el, e2) => eval (env, el) + eval (env, e2) 

7 I 'LET (x, el , e2) => 

8 eval (Envt.bind (eval (env, el), x, env), e2) 
9 

10 fun eval e = 

11 let fun run (env, e) = match e with bases (run, env) 

12 in run ( Envt . empty , e) 

13 end 



14 }} 
15 

16 (* module for the extended evaluation semantics *) 

17 module EBigStep = {{ 



18 fun bases (eval, env) = 

19 cases 'IFO (el, e2 , e3) => 

20 if eval (env, el) = then eval (env, e2) 

21 else eval (env, e3) 

22 default: BigStep . bases (eval, env) 
23 

24 fun eval c = 

25 let fun run (env, e) = match e with bases (run, env) 

26 in run ( Envt . empty , e) 

27 end 



28 }} 

Figure 5.5: The module BigStep realizes the evaluation semantics (eval), and the 
module EBigStep realizes the extended evaluation semantics (eval') by defining only a 
new case 'IFO. In an extension, only a new case is handled (Line 19-21) and the default 
explicitly refers to the original set of other cases represented by BigStep. bases (Line 
22). Then, EBigStep. bases can handle five cases including IFO. We can obtain a new 
evaluator EBigStep. eval by closing the recursion through applying bases to evaluator 
itself (Line 25). Note that a helper function run is actually applied instead of eval in 
order to pass an initial environment in Line 26. 
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1 (* module for the machine semantics *) 

2 module Machine = {{ 



3 fun ecases (K, env , estate , vstate) = 

4 cases 'Vt^R x => env x 

5 I 'NUM n => vstate (n, K) 

6 I 'PLUS (el, e2) => estate ('PLUSl (e2 , env)::K, env, el) 

7 I 'LET (x, el, e2) => estate ('LETl (x, e2 , env)::K, env, el) 

8 and vcases (v, K, estate , vstate) = 

9 cases 'PLUSl (e, env) => estate (( 'PLUSr v)::K, env, e) 

10 I 'LETl (x, e, env) => estate (K, Envt . bind (v, x, env), e) 

11 I 'PLUSr v' => vstate (v'+v, K) 
12 

13 fun estate (K, env, e) = match e with ecases (K, env, estate , vstate) 

14 and vstate (v, K) = 

15 case K of 

16 [] => V 

17 I h:: tl => match h with vcases (v, tl , estate , vstate) 
18 

19 fun eval e = estate ([] , Envt. empty, e) 



20 }} 
21 

22 (* module for the extended machine semantics *) 

23 module EMachine = {{ 



24 fun ecases (K, env, estate , vstate) = 

25 cases 'IFO (el, e2 , e3) => 

26 estate ( ' IFOl (e2 , e3 , env)::K, env, el) 

27 default: Machine . ecases (K, env, estate, vstate) 

28 and vcases (v, K, estate , vstate) = 

29 cases 'IFOl (e2 , e3 , env) => 

30 if V = then estate (K, env, e2) else estate (K, env, e3) 

31 default: Machine . vcases (v, K, estate, vstate) 
32 

33 fun estate (K, env, e) = match e with ecases (K, env, estate , vstate) 

34 and vstate (v, K) = 

35 case K of 

36 [] => V 

37 I h:: tl => match h with vcases (v, tl , estate , vstate) 
38 

39 fun eval e = estate ([] , Envt. empty, e) 



40 }} 

Figure 5.6: The module Machine realizes the machine semantics (evalm), and the 
module EMachine realizes the extended machine semantics (eval^^) by defining only 
new cases 'IFO and 'IFOl. 
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respectively. 

Product engineering 

In Section [5.3.4^ we define four interpreters (I, Im, lopt, and Igp^) differentiated by the 
feature selection. Each will be instantiated by selecting a proper architecture (either 
InterpFun and InterOptFun) and choosing its components (either BigStep or Machine, 
etc) with implicit advice from the selected feature set. For example: 

• When the feature set is FS(I), the reference architecture InterpFun gets selected 
since the Optimizer feature is not in the set. Then, the proper components are 
selected and instantiated. For example, the presence of the Evaluation semantics 
feature guides us to choose the component BigStep instead of Machine. There- 
fore, we instantiate the interpreter I as follows: 

module I = InterpFun (Checker, BigStep) 

• When the feature set is FS(lm), the reference architecture InterpFun gets cho- 
sen. Here, components Machine and Check are selected because of the presence 
of Machine semantics feature. Therefore, we instantiate the interpreter Im as 
follows: 

module Im = InterpFun (Checker, Machine) 

• When the feature set is FS(lopt), the reference architecture InterpOptFun is cho- 
sen since the Optimizer feature is in the set. Then, again, the proper compo- 
nents get selected and instantiated. Here, the presence of the Constant folding 
feature guides us to choose the component COptimizer and the presence of 
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the Machine semantics feature leads us to instantiate the component Machine. 
Therefore, we instantiate the interpreter lopt as follows: 

module lopt = InterpOptFun (Checker, 

COptimizer, 
Machine) 

• When the feature set is FS(lopt), the reference architecture InterpOptFun is cho- 
sen. As far as the components are concerned, the presence of the Conditional and 
Evaluation semantics features guide us to choose the component EBigStep. Simi- 
larly, the presence of the Optimizer, Conditional, Constant folding and Short — circuit 
forces the use of component ECSOptimizer. Therefore, we instantiate the inter- 
preter Iqp^ as follows: 

module Igp^. = InterpOptFun (EChecker, 

ECSOptimizer, 
EBigStep) 

5.6 Evaluation 

Although they are not intended to aim specifically for feature-oriented programming, 
many language constructs can be used to manage variability in the context of product 
line implementation. For example, various mechanisms including classes, aspects and 
modules can support abstraction of features. They also support extension mechanisms 
such as sub-classing, macro processing, aspect-weaving or parameterizing, which can 
be used to modularize feature composition. Among various techniques, there are 
three representative implementation approaches which can be found frequently in the 
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product line literature (iGacek and Anastasopoules 



2001 



Kastner et al. 



20081 ). 



The annotative approach 

As the name suggests, the annotative approaches implement features using some form 
of annotations. Typically, preprocessors, e.g., macro systems , have been used i n man y 



1998 



20051 ). 



literature examples as the feature product delivery method (iKang et al. 
For example, the macro language in FORM determines inclusion or exclusion of some 
code segments based on the feature selection: 

1 val interp = 

2 fn e => Evaluator . eval 

3 $IF ( ; : SOptimizer ) [ 

4 ( Optimizer . opt 

5 (Check. check e)) 

6 ][ 

7 ( Check . check e ) 



Depending on the presence of the Optimizer feature, either block (4-5 or 7) will be 
selected. 

Macro languages have some advantage in that they can be mixed easily with 
any target programming languages. However, feature specific segments are scat- 
tered across multiple classes, so code easil y becomes complicated. Saleh and Gomaa 



propose the feature description language (ISaleh and Gomaa 



20051 ). Its syntax looks 



similar to the C/C++ preprocessor but it supports separation of concerns by modu- 
larizing feature specific code in a separate file. In the annotation approach, however, 
target compilers do not understand the macro language and any error appearing in 
feature code segments cannot be detected until all feature sets are selected and the 
corresponding code segments are compiled. 
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The compositional approach 

For taking advantage of the current compiler technology including static typing and 
separate compilation, we need native language supports. Therefore, language-oriented 
proposals generally take compositional approach es by providing better support fo r 



AHEAD fiBatorv 



feature modularitv (jLopez-Herrejon et al 



2005f ). FeatureC++ flApel et al 



20051 ) ■ 



|200J) and AspectJ are such language extensions. 
In this approach, features are implemented as distinct units and then they are 
combined to become a product. Aspect-oriented progra mming has become popular 



as a way of implementing the compositional approach (ILee et al. 



2006 



Cho et al. 



20081 ). The main idea is to implement variations as separate aspects and to obtain 
each product by weaving base code and aspect code. Our extensible cases provide 
similar composability. Furthermore, our module language also supports extensible 
modules, which make large-scale code reusable. Note that composition in aspects 
does not provide separate compilation, so base code requires to be either re-typed- 
checked or re-compiled or both for every composition. However, our module system 
supports separate compilation. 



The parameterization approach 

The idea of parameterized programming is to implement the common part once and 
parameterize variations so that different products can be instantiated by assigning 
distinct values as parameters. Functors, as provided by Standard ML (SML), are 
a typical example in that they can be parameterized on values, types and other 



modules (lAppel and MacQueen 



199ll ). The SML module system has been demon- 



strated to be powerful enough to manage variations in the context of product lines 



(jChae and Blume 



20081 ). However, its type system sometimes imposes restrictions 
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which require code duphcation between functions on data types. Many propos- 
als to overcome this r estriction have be en presented. For example, MLPolyR pro- 



poses extensible cases (IBlume et al. 



(iGarrigue 



2000) 



20061 ). and OCaml proposes polymorphic variants 



Similarly, templates in C++ provide parameterization over types and have been 



exten sively studied in the context of programming families (jCzarnecki and Eisenecker 



2000l ). Recently, an improvem ent that would provide better 



gramming has been proposed (IDos Reis and Stroustrup 



support of generic pro- 



20061 ). Originally, Jav a and 



C# d id not support parameterized types but now both support similar concepts (ITorgersen 



20M). 



Sometimes the parameterization approach is criticized for its difficulty in iden 



200l|). 



tifying variation points and defining parameters (iGacek and Anastasopoules 
However, systematic reasoning (e.g., product line analysis done by product line ar- 
chitects) can eas e such burden by prov iding essential information for product line 



implementation (jChae and Blume 



20081) 



CHAPTER 6 
CONCLUSION 



Software evolves by means of change. Changes may be implemented either sequen- 
tially or in parallel. Sequential changes form a series of software releases. Some 
changes carried out in parallel may also be merged back together. In this situation, 
we are interested in extension mechanisms which provide a way to add extensions in 
a reliable way. Some changes implemented in parallel, however, cannot be combined 
together so a single software product diverges into different versions. In this case, 
multiple software versions may evolve independently although much of their code is 
duplicated, which makes it difficult to maintain them. Under these circumstances, 
we need a way of managing variability among multiple versions so that we can easily 
manage the evolution of a set of products. 

In this thesis, we propose type-safe extensible programming which takes two di- 
mensions into consideration. In particular, our language provides type-safe extensi- 
bility mechanisms at multiple levels of granularity, from the fine degree (at the core 
expression level) to the coarse degree (at the module level). At the same time, in order 
to manage variability, we adopt product line engineering as a developing paradigm 
and then show how our extensibility mechanisms can be used to implement a set of 
products: 

• In Section [3l we propose a core language that supports polymorphic extensible 
records, first-class cases and type safe exception handling. With cases being 
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first-class and extensible, we show that our language enables a very flexible 
style of composable extension; 

• In Section m we propose a module system that makes extensible programming at 
the module level possible. We also show how to compile each module separately 
in the presence of all of the above features; 

• In Section 0, we propose a development process which adopts product line 
engineering in order to manage variability in a family of systems. We show that 
our extensibility mechanisms can be put to good use in the context of product 
line implementation. 

We are continuing this work in several ways. First, we plan to improve our type 
system. For example, we have constructed a prototype compiler for MLPolyR that 
retains all of the MLPolyR features as well as mutable record fields. Records with 
mutable fields have identity, and allocation of such a record is a side-effecting op- 
eration. However, mutable data type can weaken our polymorphic type system, in 
situations w here the so-called value re striction prevents row type variables from being 



generalized 



Pessaux and Leroy 



19991 ). Pessaux and Leroy presents such an example 



that shows a false positive: 

1 let val r = {| i = fn x => x+1 |} 

2 fun f y c = if c then r ! i y 

3 else raise 'Error () 

4 in r ! i 

5 end 

First, r has type {\i : int int|} where p is not generalized since the whole expression 
is not a syntactic value (Line 1). Then, during typing /, a true branch with r\i y (Line 
2) is unified with a false branch with raise 'Error() (Line 3). Therefore, p becomes 
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Error(); p' and the application r\i falsely appears to raise Error() even though it does 
not (Line 4). Pessaux and Leroy suggests that this false positive could be avoided 
with a more precise tracking of the flow of exceptions. 



Additionally, as we discussed in Section 14.31 non-generalized unification variables 
in the presence of mutable references makes our type system unsound unless they are 
instantiated consistently across the module boundary. We plan to add a consistency 
checking mechanism during linking time. 

Second, our module system does not require any type decoration since the type 
system infers module signatures as it infers types of core expressions. However, there 
will be a need for programmers to spell out types. For example, module signatures in 
libraries are generally required to be explicit. We plan to support explicit specification 
of module signatures and conventional signature matching as in SML. However, there 
can be situations where row types and kind information make it difficult to specify 
full typing information. As we have seen in Section 14.31 we might ask programmers 
to write the following type decoration for map: 

val map : Va : *.V/5 : ^.V7 : 0.V5 : 0.(a ^ /3) A ([a] ^ [/?]) 

It is possible to avoid this excessive notational overhead by defining a little lan- 
guage with good built-in defaults (e.g., abbreviation for common patterns). Then, 

programmers would specify their intentions using this language and th ese intended 

types can be checked against inferre d types in a style of software contract (IFindler and Felleisen 



2002 



Blume and McAUester 



20061 ). For example, we may specify map's type as fol- 



lows and all elided parts can be inferred and checked by a compiler: 
val map : (a ^ /3) ([a] [/5]) 
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Third, we plan to integrate feature composition with our language. Our work 
shows that modern programming language technology such as extensible cases and 
parameterized modules is powerful enough to manage variability identified by prod- 
uct line analysis. However, in our approach, the relations among features, architec- 
tures, and components are implicitly expressed only during the product line analysis. 
Similarly, Most feature-oriented programming languages do not have the notion of 
a "feature" in the language syntax since features are merely considered conceptual 
abstractions rather than concrete language constructs. Therefore, these languages 
cannot state the r elations between a feature and its corresponding code segments in 



the program text (lApel et al 



I2OO8I ). However, other product line model-based meth- 
ods usually provide a way to express those relations explicitly by using CASE tools. 
In FORM, for example, those explicit rel ations make it po ssible to automatically 



generate product code from specifications (IKang et al. 



19981). 



In our recent work, we are proposing a macro systen i for MLPolyR, which aug- 



20091). We 



ments the language with an explicit notion of features (iChae and Blumd 
implemented this mechanism in order to make it possible to write feature composi- 
tion in terms of features. Then, the compiler can integrate the corresponding code 
automatically once we provide a valid feature set. Since our expansion rules do not 
support any specification of feature relationships (i.e., mutually exclusive or required 
relations), however, the MLPolyR compiler cannot detect any invalid feature sets. 
We leave such validation to feature modeling tools which provide various diagnoses 
on feature models. Our goal is to let a front-end modeling tool generate valid expan- 
sion rules in the MLPolyR language so that an application can automatically be 
assembled only by feature selection. 
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